Measuring Inheritance in Java

Every now and again when reading someone's comments about design quality, the comments include some mention of use of inheritance, in a "of course everyone knows use of inheritance indicates good quality" kind of way. Of course OO proponents have been pushing this line since forever but some of these comments are from people I feel ought to know better - which makes me wonder what I'm missing. These comments seem to imply we should be using inheritance as much as possible, but that seems silly, so what is really meant?

Since I do software measurement, it naturally occurs to me to try to find evidence, in the form of measurement, to support claims about use of inheritance. This immediately raises the question of what to measure. My starting point is to want to know how much inheritance actually takes place. What metrics might I use to determine this?

In fact there is already an "obvious" answer - use some of the CK metrics [CK91]. Two of the metrics they proposed seem relevant: DIT (Depth of Inheritance Tree) and NOC (Number of Children). Their definitions are given below (actually based on the journal version of the paper [CK94]). For a class C:

DIT
DIT(C) = depth of C in the inheritance tree
where the depth of a node of a tree is the length of the maximal path from the node to the root of the tree.
NOC
NOC(C) = Number of immediate descendants of the class

So how can we use these metrics to get an idea for how much inheritance takes place? Using DIT is complicated by the use of "library" classes, that is, libraries used in the application but not developed by the development team. The issue here is that a class can have a DIT of, say, 5, but that's because an application inherits from a library class with a DIT of 4. It's not at all clear to me what we're learning about the application's use of inheritance by considering DIT. (Which isn't to say DIT is useless, but just not useful for what I want.)

NOC doesn't have problems due to library classes. The main problem I have with it (and this applies to DIT too) it measures only a single class, whereas to understand the quality of the design of an application, we need to measure the whole application. We can look at distributions of these measures (as was done in the later CK paper [CK94]), but this still leaves me with the question of how to interpret these distributions — for example, is the fact that they might be a power-law distribution important?

Ok, if DIT and NOC aren't going to be that useful, what should we measure? I would argue that a good start is, "how many classes are defined using inheritance", that is, how may classes extend some class other than Object. The reasoning is, if we didn't have inheritance, it would require more work to implement these classes than if we have inheritance (assuming it was possible to implement them at all without inheritance).

The issue of library classes must still be considered. Of those classes that inherit from something (other than Object), some may inherit from library classes and the rest will inherit from other classes in the application. Classes that inherit from library classes benefit from using the library classes, whereas there was less benefit accrued from inheriting from other application classes, as those other classes also had to be created. On the other hand, one application class inheriting from another application class represents a design decision under the control of the development team, and so potentially says something about the quality of the design. I think it is worth recording the distinction in any measurement, as I will do below.

We can also consider how many classes get inherited from, which is essentially the "inverse" of how many classes are defined using inheritance from application classes. It is also the number of classes with NOC > 0. This also indicates something about the design decisions made with respect to inheritance in the application.

It's worth pausing here and reflecting on what we would expect to see. Would we expect 50% of the classes defined using inheritance or 5%? Would there be more classes that get inherited from, as opposed to defined using inherited?

Since I am considering counting classes, all the issues associated with counting classes apply. I've already mention the need to distinguish "library" classes. Another issue is, while I've been saying "class", I really mean "type", that is, we need to distinguish between the different kind of type — classes, interfaces, enums, annotations, and exceptions.

This then raises the question of what exactly is meant by "inheritance"? There are two flavours in Java — extends and implements. These flavours place restrictions between different kinds of types. For example, a class can both extend another class and implement an interface, but an interface can only extend another interface and can't implement anything. As a consequence, counting we need to be careful interpreting results if we group all of these inheritance relationships together. Things are a bit complicated with enums and annotations because enums are really specialised classes that extend java.lang.Enum) and annotations are specialised interfaces that extend java.lang.annotation.Annotation. Counting these inheritance relationships seems misleading to me, so I don't. Exceptions are even more complicated because they are also specialised types defined by inheritance and (unlike enums and annotations) this relationship is explicit in the code. In this case I am reluctant to not count this relationship because it would be inconsistent with the source code. Finally, because enums, annotations, and exceptions are specialised interfaces or classes, this can allow some unexpected inheritance relationships. The full set is shown in the table below.

Inheritance relationships between kinds
  Class Interface Enum Annotation Exception
Class extends implements
Interface extends extends (not recommended)
Enum implements
Annotation
Exception implements extends

Another issue that needs to be addressed is nested types. I initially thought there was no real to distinguish them. If a class is being defined using inheritance then I saw no reason to care that either it is a nested class or that the class it is inheriting from is nested. Then it occurred to me that it's conceivable that a lot of the nested classes might be involved in frameworks (such as swing), which would involve implementing framework interfaces or extending framework classes. It would be interesting to know if more inheritance occurs with nested classes or not, so I distinguish classes at different levels of nesting.

So, we now have 7 inheritance-flavoured relationships, and for each we can consider those types that are defined using that relationship, and those that some other type uses in that relationship. For the "using" relationship, I distinguish between types defined in the application, types that are part of the Standard API, and Third Party types. All up, that's 28 metrics! While that's maybe a bit over the top, most of the time we'll be interested in specific combinations that I'll discuss below.

  Using (Defined Using Inheritance) Used (Inherited From)
Class-Class SLCCDUI, TPCCDUI, UDCCDUI CCIF
Class-Interface SLCIDUI, TPCIDUI, UDCIDUI CIIF
Interface-Interface SLIIDUI, TPIIDUI, UDIIDUI IIIF
Interace-Annotation SLIADUI, TPIADUI, UDIADUI IAIF
Enum-Interface SLEIDUI, TPEIDUI, UDEIDUI EIIF
Exception-Interface SLExIDUI, TPExIDUI, UDExIDUI ExIIF
Exception-Exception SLExExDUI, TPExExDUI, UDExExDUI ExExIF
Metric components Each row is one of the 7 relationships identified above (whether it's extends or implements is implied by the combination of kinds of type). The columns show the two directions of the relationship. The cells of the "using" relation have the metrics in the order of: using Standard Library, using Third Party, or using User Defined.

For these metrics, the "left hand side" of the relationship being counted will always be a user-defined class, so, for example, CCDUI will never count the fact that java.util.Vector extends java.util.AbstractList.

As I have hinted at above, it would probably be more useful to have these values as proportions of the total. We can get the application size in number of classes broken down by kind of type and level of nesting. I do this both at each nesting level, and over all nesting levels in a given category,.

The combinations that are probably going to be of most interest are as follows:

CCDUI = SLCCDUI + TPCCDUI + UDCCDUI

When one class extends another, it usually means that less code had to be written for that class because it is inheriting code from its ancestor classes. This metrics therefore represents the number of classes that have probably directly benefited from reduced effort.

When computing proportions, the denominator for this metric is the total number of user defined classes (at all nesting levels).

UDDUI = "UDCCDUI + UDCIDUI + UDIIDUI"

Both forms of inheritance represent design decisions. This metric represents how many types defined by the developer were involved in this kind of decision. Note that this isn't really a sum, as otherwise some modules would be double counted (those classes that both extend another class or implement an interface, or those that inherit from multiple interfaces).

When computing proportions, the denominator for this metric is the total number of user defined classes and interfaces (at all nesting levels).

CIDUI = "SLCIDUI + TPCIDUI + UDCIDUI"

Implementing an interface does not directly reduce effort in constructing the class as extending another class. Nevertheless there is an expectation of some effort saved via a mechanism that some people have called context reuse [BT96]. This metric represents how many types will benefit only from context reuse. Again we have to worry about double counting.

When computing proportions, the denominator for this metric is the total number of user defined classes (at all nesting levels).

TCCIF

The total of the CCIF measurements for all levels of nesting.

When computing proportions, the denominators is the total number of user defined classes (at all nesting levels).

TCIIF

The total of the CIIF measurements for all levels of nesting.

When computing proportions, the denominators for these metrics are the total number of user defined interfaces (at all nesting levels).

TIIIF

The total of the IIIF measurements for all levels of nesting.

When computing proportions, the denominators for these metrics are the total number of user defined interfaces (at all nesting levels).

DUI

The total number of modules defined using inheritance. This is the number of classes, interfaces, enums, annotations, and exceptions that extend or implement something other than java.lang.Object, java.lang.Cloneable or java.io.Serializable. Or, other other words, all of the modules that it is reasonable to say that they "use inheritance".

When computing proportions, the denominators is the total number of user defined modules (at all nesting levels).

IF

The total number of modules that are inherited from. This is the number of user defined classes, interfaces, enums, annotations, and exceptions that are extended from or implemented.

When computing proportions, the denominators is the total number of user defined modules (at all nesting levels).

My guess is that the remaining possible combinations of metrics are likely to have small values (mainly zero) and so I don't think there's any need to consider them.

Recently, James Noble pointed out to me that people probably use static nested classes differently from non-static inner classes (i.e. inner classes), so I decided it would be worth separating that out. This distinction does not apply at nesting level 0 (top level), so those numbers will always be zero, and nested interfaces are by definition static [JLS 8.5.2] so those aren't distinguished at all.

Now to apply these metrics to actual applications.

jgraph-5.10.0.1

Jgraph is an "application" I have used before. Within each cell, there's a list of triplets of the form: <numerator> / <denominator> ( <percent>% ), one triplet for each level of nesting (ordered 0-nesting, 1-nesting, etc). The <numerator> is the actual measurement for that metric and level, the <denominator> is that number of relevant modules in that category, <percent> is the proportion.
  SL/DUI Static TP/DUI Static UD/DUI Static IF
CC 14/36 (39%)
12/37 (32%)
0/36 (0%)
2/37 (5%)
0 0 7/36 (19%)
2/37 (5%)
0/36 (0%)
3/37 (8%)
8/36 (22%)
1/37 (3%)
CI 5/36 (14%)
5/37 (14%)
0/36 (0%)
1/37 (3%)
0 0 14/36 (39%)
10/37 (27%)
0/36 (0%)
3/37 (8%)
14/14 (100%)
3/3 (100%)
II 4/14 (29%) 0 0 0 2/14 (14%)
1/3 (33%)
0 1/14 (7%)
1/3 (33%)
IA 0 0 0 0 0 0 0
EI 0 0 0 0 0 0 0
ExI 0 0 0 0 0 0 0
ExEx 0 0 0 0 0 0 0

Now the summary measurements. Maximum is what is used to compute the Proportion.
Metric Measurement Maximum Proportion
CCDUI 40 73 55%
UDDUI 40 90 44%
CIDUI 37 73 51%
TCCIF 9 73 12%
TCIIF 17 17 100%
TIIIF 2 17 12%
DUI 69 90 77%
IF 26 90 29%

So, about 40% of classes extend JDK classes, and another 15% or so extend other jgraph classes, for an overall total of 55% of classes extending other classes.

12 of the 14 level-1 nested classes that extend a standard library class are non-static whereas 2 are static. This is consistent with use of inner classes to provide such things as event handlers for Swing (and manual inspection shows that all 12 are extending either swing or awt classes). The 2 static nested classes are extending java.awt.geom.Point2D$Double. 10 of the 13 level-1 nested classes that extend a user-defined class are non-static, and again, from manual inspection, they all look like doing event handling and similar things.

JGraph is advertised to work with Java 1.3, suggesting that it doesn't use enums or annotations, so it's unsurprising how many zeros occur here.

The CIIF proportions of 100% is not that surprising. These metric measure the number of interfaces defined by the user that are implemented. Since there isn't much point in defining an interface without implementing it, anything less than 100% should raise questions.

The most interesting measurement is DUI. According to this, 82% of user-defined types are defined by "using inheritance", that is, either extending something or implementing something. That's a much bigger number than I was expecting.

JGraph is a framework for drawing graphs, and as such makes a lot of use of the Java graphics support. As this involves dealing with the swing framework, it is perhaps unsurprising that a number of classes that are defined using inheritance.

As always, when one actually tries to use a metric on real code (as opposed to made up examples to show up a given metric's "capabilities"), one quickly finds non-obvious situations that have to be dealt with. In this case, it's dealing with so-called marker interfaces. It turns out that a lot of jgraph's classes implement java.io.Serializable, and a few also implement java.lang.Cloneable. Counting such classes as "using inheritance" seems fairly mis-leading, so I don't. The measurements given above ignore any relationship to these two interfaces. I haven't found a definite list of marker interfaces, even within the JDK, so for the moment I'll just ignore only these two.

eclipse-3.2.2

Another favourite application to measure is Eclipse.
  SL/DUI Static TP/DUI Static UD/DUI Static IF
CC 158/11373 (1%)
72/9728 (1%)
0/11373 (0%)
45/9728 (0%)
2/101 (2%)
84/11373 (1%)
4/9728 (0%)
0/11373 (0%)
5/9728 (0%)
6959/11373 (61%)
3079/9728 (32%)
16/101 (16%)
0/11373 (0%)
922/9728 (9%)
27/101 (27%)
2135/11373 (19%)
159/9728 (2%)
3/101 (3%)
CI 88/11373 (1%)
981/9728 (10%)
3/101 (3%)
0/11373 (0%)
429/9728 (4%)
5/101 (5%)
163/11373 (1%)
15/9728 (0%)
0/11373 (0%)
13/9728 (0%)
3417/11373 (30%)
3260/9728 (34%)
15/101 (15%)
0/11373 (0%)
594/9728 (6%)
19/101 (19%)
1880/2175 (86%)
106/116 (91%)
II 31/2175 (1%)
1/116 (1%)
0 8/2175 (0%) 0 624/2175 (29%)
8/116 (7%)
0 326/2175 (15%)
2/116 (2%)
IA 0 0 0 0 0 0 0
EI 0 0 0 0 0 0 0
ExI 0 0 0 0 0 0 0
ExEx 54/11373 (0%)
20/9728 (0%)
4/101 (4%)
0 0 0 35/11373 (0%) 0 11/11373 (0%)

The summary measurements are:

Metric Measurement Maximum Proportion
CCDUI 11373 21202 54%
UDDUI 17215 23493 73%
CIDUI 8948 21202 42%
TCCIF 2297 21202 11%
TCIIF 1986 2291 87%
TIIIF 328 2291 14%
DUI 19216 23493 82%
IF 4361 23493 19%
Yes, that really is over 19,000 user defined types that are defined using one form of inheritance or other! Again, around the 80% mark for DUI. Note that we don't have 100% of user-defined interfaces being implemented. My guess is that this is due to the "plug-in" architecture that eclipse has, although I'm still slightly puzzled that there aren't even default implementations for all interfaces.

freecol-0.6.0

Freecol is a useful application to study. It's not something to do with software development (unlike eclipse), is a complete application (unlike jgraph), features lots of graphics, a non-trivial architecture (client-server), big enough to be interesting but not so big as to make it hard to manually check the numbers.
  SL/DUI Static TP/DUI Static UD/DUI Static IF
CC 58/285 (20%)
50/285 (18%)
0/285 (0%)
9/285 (3%)
4/19 (21%)
0 0 163/285 (57%)
64/285 (22%)
0/285 (0%)
12/285 (4%)
25/285 (9%)
6/285 (2%)
CI 62/285 (22%)
65/285 (23%)
0/285 (0%)
39/285 (14%)
11/19 (58%)
0 0 32/285 (11%)
34/285 (12%)
0/285 (0%)
1/285 (0%)
4/19 (21%)
17/17 (100%)
3/3 (100%)
II 0 0 0 0 1/17 (6%) 0 1/17 (6%)
IA 0 0 0 0 0 0 0
EI 0 0 0 0 0 0 0
ExI 0 0 0 0 0 0 0
ExEx 1/285 (0%)
1/285 (0%)
0 0 0 0 0 0

The summary measurements are:

Metric Measurement Maximum Proportion
CCDUI 360 589 61%
UDDUI 296 609 49%
CIDUI 244 589 41%
TCCIF 31 589 5%
TCIIF 20 20 100%
TIIIF 1 20 5%
DUI 532 609 87%
IF 51 609 8%

Again, over 80% for DUI. IF is well down on the others, but there aren't that many interfaces, and from CCIF we can see that there aren't that many classes being extended.

azureus-2.3.0.4

Azureus is quite a different kind of application.
  SL/DUI Static TP/DUI Static UD/DUI Static IF
CC 35/1124 (3%)
19/1286 (1%)
0/1124 (0%)
8/1286 (1%)
0 0 479/1124 (43%)
341/1286 (27%)
0/1124 (0%)
120/1286 (9%)
149/1124 (13%)
5/1286 (0%)
CI 18/1124 (2%)
32/1286 (2%)
0/1124 (0%)
21/1286 (2%)
0 0 697/1124 (62%)
585/1286 (45%)
0/1124 (0%)
142/1286 (11%)
462/476 (97%)
15/16 (94%)
II 8/476 (2%) 0 0 0 65/476 (14%) 0 37/476 (8%)
IA 0 0 0 0 0 0 0
EI 0 0 0 0 0 0 0
ExI 0 0 0 0 0 0 0
ExEx 49/1124 (4%)
1/1286 (0%)
0 0 0 5/1124 (0%) 0 5/1124 (0%)

The summary measurements are:

Metric Measurement Maximum Proportion
CCDUI 1002 2410 42%
UDDUI 2221 2902 77%
CIDUI 1492 2410 62%
TCCIF 154 2410 6%
TCIIF 477 492 97%
TIIIF 37 492 8%
DUI 2401 2902 83%
IF 656 2902 23%

Four applications could hardly be considered representative, but still they all show quite high levels of "use of inheritance". While this isn't necessarily evidence that "inheritance is good", it's pretty clear that "inheritance is used". A larger study is clearly warranted.

Update: The larger study got done. An early writeup is available as a technical report.

Update: The study appeared in ECOOP 2008

References

[BT96] R. L. Biddle, E. D. Tempero, "Understanding the Impact of Language Features on Reusability," Fourth International Conference on Software Reuse (ICSR'96) (ICSR), p. 52, 1996.
[CK91] Chidamber, S. R. and Kemerer, C. F. Towards a metrics suite for object oriented design. In Conference Proceedings on Object-Oriented Programming Systems, Languages, and Applications (Phoenix, Arizona, United States, October 06 - 11, 1991). A. Paepcke, Ed. OOPSLA '91. ACM Press, New York, NY, 197-211. DOI= http://doi.acm.org/10.1145/117954.117970
[CK94] Shyam R. Chidamber and Chris F. Kemerer. A Metrics Suite for Object Oriented Design IEEE Transactions on Software Engineering, 20(6):476-493, June 1994
[FP96] Norman Fenton and Shari L. Pfleeger, "Software Metrics: A Rigorous and Practical Approach," International Thomson Computer Press, London, UK, 1997, second edition.
[JLS]The Java Language Specification, Third Edition
[McC76] Thomas J. McCabe. 1976. "A Complexity Measure" IEEE Transactions on Software Engineering SE-2 (4):308-320 Dec. 1976

History

15 May 2007
Draft completed
08 July 2007
Added the DUI and IF measurements and metrics to distinguish levels. Redid all measurements. Added Azureus measurements. Some minor rewriting to accomodate this.
02 August 2007
Added the distinction between static and non-static nested modules, plus associated writing.
September 2007
Added reference to technical report.
March 2008
Added reference to ECOOP publication.