Measuring Software

This page describes the kinds of projects I (Ewan Tempero) am interested in supervising. It is not exclusive - I will consider projects in most areas of software engineering - but the kinds of projects I describe here will tend to get priority. See my list of current and past students to get an idea of projects I have supervised in the past.

My main area of research is in measuring software. This includes figuring what is interesting or useful about code to measure, figuring out how to make the measurements of code, figuring out what these measurements mean in terms of how software is developed and quality of code, and figuring out what visualisations of these measurements can help improve software development

There are a number of topics that must be addressed that I discuss in more detail below, and include some ideas for projects that could be done in those topics. A research project may address just one, or several of these topics. I include references to research publications and student research that has resulted from study in these topics.

Metric development

The are many many features of code that can be measured, and comparatively little work has been done in this area. Even the simplest of features seems to require many different metrics due to the interaction between different features. For example, to measure "inheritance", we must take into consideration such things as, is the class inheriting from another class, developed for the same application, or is it inheriting from one of the standard library classes, or from some third-party library. We also need to consider whether there is a difference between (in Java or C#) inheritance by interfaces and inheritance by classes, or whether a class inheriting from another class is to be treated differently to a class implementing an interface. These questions were discussed in a publication [TNM08].

Example projects on this topic are:

Develop new metrics that potentially provides useful information about some feature of code (see for example [TCN10], [Tem09], [YT07], [MT07a], [MT07b], [CT07], [Choi2006], Yang, Melton, Riaz).
For any existing metric, develop a variation appropriate to a different languages. For example, C++ and CLOS have multiple inheritance, and so different inheritance metrics may be needed those languages. (see for example [Leonhardt2006], KimUmeda2007).
There are many metrics that have been proposed but have never been properly thought out, such as "coupling" and "cohesion", metrics for inheritance, even "size". Develop well-defined metrics for these ideas (see for example TNM08, Yang, Melton).

Measuring Code

Metrics by themselves are not very useful. They need to be used to take measurements of actual code. This is an important step as it is only when measurements are taken that we really get to understand what the metrics might tell us (to say nothing of whether or not the metrics are well-defined).

This involves building instruments that provide measurements for the metrics, which is usually a non-trivial process. As well as figuring out how to produce accurate measurements, when the measurements are to be taken may affect its design. For example, measurements may be taken on past releases, nightly builds, when checked into version control, or as the code is being written.

New metrics, whose characteristics we don't yet fully understand, will typically be applied to many (100-500) different applications. The instruments will need to be designed to make it easy to try variations on the metrics as our understanding improves. Measurements taken on nightly builds or version control check-in require instruments that are very stable and robust and that can be integrated into different companies' source code repositories. Measurements taken when code is being written need to integrate into the integrated development environment and need to be fast enough as to not interfering with programming activities.

Example projects on this topic are:

Develop instruments for measuring existing or new metrics and carry out empirical studies with them (see for example [TCN10], [Tem09], BT07, MT07a, Tem08, TNM08, MPT+08, BFN+06, Yang, Melton, Leonhardt2006, Choi2006).
Develop a framework for integrating measuring instruments into different kinds of source code repositories.
Develop new instruments or integrate existing instruments into an IDE such as an plug-in for Eclipse. (see for example MT07c, Zhang2007).
Develop the infrastructure for making measurements in an IDE that will allow new metrics to be used without the need for new plug-ins or add-ons.

Visualising Measurements

Taking measurements usually produces a lot of data. The challenge is then to present this data in a way that allows someone to interpret it sensibly, and to answer the questions being asked. Visualisation is a successful technique for presenting data in other disciplines, but it is still early days for data that comes from code. Exactly what kinds of visualisation will be successful depends on the metrics being used, the questions being asked, and the goals of taking the measurements in the first place. Presenting measurements from the same metric across many applications for understanding the characteristics of a metric will likely have to be done differently than presenting data from multiple metrics for the same application during the nightly build as part of a organisations quality control processes. Visualisations that might be feasible for off-line presentations may not work for being integrated into an IDE because they are too slow to create.

Example projects on this topic are:

Develop novel visualisations for different metrics (see for example ANMT08b, Zhang2007, Kim2007).
Develop different ways to present the visualisations (see for example ANMT08a).

Interpreting Measurements

Once that data can be presented, then it is time to figure out what it all means. What does it tell us about the current state of the code; Is everything going to plan; and What decisions need to be made.

Example projects on this topic are:

Determine how characteristics of some measurements related to quality attributes of the code, such as modifiability, understandability, or testability (see for example [Yang2009], MT07d).

Supporting Software Metrics Research

Much of the research I have been doing has been developing new metrics and taking measurements with them. This requires something to actually measure. We can learn more about an individual metric if we can relate its measurements with those from another metric, but this requires that we always measure the same thing. An important product that has been supporting my research is the development of a standard software corpus. This is a collection of open-source software that has been organised to allow (relatively) large-scale empirical studies. It is also being distributed to other research groups. More information is available here.

Example projects on this topic are:

Determine what quality control measures are necessary to ensure that the corpus meets the requirements of good empirical software engineering research (see for example Han2008).

Relevant Publications

This lists publications that come directly from this research. Many of the publications are derived from research by students. The theses and reports are listed below.

Research Publications

[TCN10]: Ewan Tempero, Steve Counsell and James Noble 'An Empirical Study of Overriding in Open Source Java' Thirty-Third Australasian Computer Science Conference (ACSC2010), January 2010
[Tem09]: Ewan Tempero 'How Fields are Used in Java: An Empirical Study' Australian Software Engineering Conference (ASWEC) April 2009
[MPT+08]: Radu Muschevici, Alex Potanin, Ewan Tempero and James Noble 'Multiple Dispatch in Practice' ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, October 2008.
[TNM08]: Ewan Tempero, James Noble and Hayden Melton 'How do Java Programs Use Inheritance? An Empirical Study of Inheritance in Java Software' 22nd European Conference on Object-Oriented Programming (ECOOP), Springer Berlin / Heidelberg Paphos, Cyprus. July 2008. pp. 667-691. [TR] [Publisher]
[MAT08]: Homan Ma, Robert Amor and Ewan Tempero 'Indexing the Java API Using Source Code' Australian Software Engineering Conference (ASWEC), Perth, Australia. March 2008. pp. 451-460. [TR] [DOI]
[YMT08]: Hong Yul Yang, Hayden Melton and Ewan Tempero 'An Empirical Study into Use of Dependency Injection in Java' 19th Australian Software Engineering Conference (ASWEC), Software Engineering Research Report, University of Auckland Perth, Australia. March 2008. pp. 239-247. [TR] [DOI]
[ANMT08b]: Craig Anslow, James Noble, Stuart Marshall and Ewan Tempero 'Visualizing the Word Structure of Java Class Names' OOPSLA 2008 Poster, October
[ANMT08a]: Craig Anslow, James Noble, Stuart Marshall and Ewan Tempero 'Towards End-User Web Software Visualization' Graduate Consortium at the IEEE Symposium on Visual Languages and Human Centric Computing (VLHCC), Herrsching am Ammersee, Germany. September 2008. [PDF]
[Tem08]: Ewan Tempero 'An Empirical Study of Unused Design Decisions in Open-source Java Software' UoA-SE-2008-1, Software Engineering Research Report, University of Auckland June 2008. [TR]
[MT07a]: Hayden Melton and Ewan Tempero 'An Empirical Study of Cycles among Classes in Java' Empirical Software Engineering, 12:4 Springer Netherlands August 2007. pp. 389-415. [TR] [DOI]
[BT07]: Richard Barker and Ewan Tempero 'A Large-Scale Empirical Comparison of Object-Oriented Cohesion Metrics' Fourteenth Asia-Pacific Software Engineering Conference, Nagoya, Japan. December 2007. pp. 414-421. [TR] [DOI]
[MT07d]: Hayden Melton and Ewan Tempero 'Static Members and Cycles in Java Software' 1st International Symposium on Empirical Software Engineering and Measurement (ESEM), September 2007. pp. 136-145. [PDF] [DOI]
[YT07]: Hong Yul Yang and Ewan Tempero 'Measuring the Strength of Indirect Coupling' Australian Software Engineering Conference, IEEE Computer Society Melbourne, Australia. April 2007. pp. 319-328. [TR] [DOI]
[MT07b]: Hayden Melton and Ewan Tempero 'The CRSS Metric for Package Design Quality' Australasian Computer Science Conference, Published as CRPIT 62. Australian Computer Science Communications Ballarat, Australia. January 2007. pp. 201-210. [TR] [Publisher]
[MT07c]: Hayden Melton and Ewan Tempero 'JooJ: Real-time Support for Avoiding Cyclic Dependencies' Australasian Computer Science Conference, Published as CRPIT 62. Australian Computer Science Communications January 2007. pp. 87-95. [TR] [Publisher]
[CT07]: Kelvin H T Choi and Ewan Tempero 'Dynamic Measurement of Polymorphism' Australasian Computer Science Conference, Published as CRPIT 62. Australian Computer Science Communications Ballarat, Australia. January 2007. pp. 211-220. [Publisher]
[MAT06]: Homan Ma, Robert Amor and Ewan Tempero 'Usage Patterns of the Java Standard API' Thirteenth Asia Pacific Software Engineering Conference (APSEC06), IEEE Computer Society Bangalore, India. December 2006. pp. 342-349. [DOI]
[BFN+06]: Gareth Baxter, Marcus Frean, James Noble, Mark Rickerby, Hayden Smith, Matt Visser, Hayden Melton and Ewan Tempero 'Understanding the Shape of Java Software' ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Portland, OR, U.S.A. October 2006. pp. 397-412. [DOI]
[MT06]: Hayden Melton and Ewan Tempero 'Identifying Refactoring Opportunities by Identifying Dependency Cycles' Twenty-Ninth Australasian Computer Science Conference, Published as CRPIT 48. Hobart, Tasmania, Australia. January 2006. pp. 35-42. [Publisher]
[YTB05]: Hong Yul Yang, Ewan Tempero and Rebecca Berrigan 'Detecting Indirect Coupling' The Australian Software Engineering Conference, IEEE Computer Society Brisbane, Australia. March 2005. pp. 212-221. [DOI]

Graduate Projects

[Yang2009]: Hong Yul Yang, PhD 2009. Measuring Indirect Coupling
[Melton]: Haydon Melton, PhD Measuring the Effect of Refactoring and Design Patterns on Software Quality (in progress)
[Riaz]: Mehwish Riaz PhD Understanding the impact of database design on the design quality of software systems (in progress).
[Han2008]: Ted Pei-Hsuan Han Summer Research Scholarship 2007-2008. Improving a Software Corpus
[KimUmeda2007]: Misun Kim and Taiga Umeda, BE(SE) Part IV project 2007. Mozilla Source Code Analysis.
[Barker2007]: Richard Barker, ME(SE) 2007. An Empirical Study of Cohesion Metrics
[Ma2007]: Homan Ma, ME(SE) 2007. Using Variable Identifiers to Index the Java 1.4.2 API
[Kim2007]: Misun Kim Summer Research Scholarship 2006-2007. X3D Visualisation of Software Metrics Data
[Zhang2007]: Huinan Zhang, Summer Research Scholarship 2006-2007. An Eclipse interface for JooJ
[Leonhardt2006]: Enrico Leonhardt, BSc Postgraduate Project 2006. An empirical study of power-laws and cycles in C# applications
[Choi2006]: Hio Tong (Kelvin) Choi, ME(SE) 2006. Dynamic Reuse Metrics