Detailed Product Knowledge is not a Prerequisite for an Effective Formal Software Inspection

 

Howard E. Dow

Adjunct Faculty

Department of Computer Science

University of Massachusetts at Lowell

Lowell, MA 01854

 

James S. Murphy

Adjunct Lecturer

Division of Continuing Education

Merrimack College

Andover, MA 01845

 

 

Abstract: Contrary to common wisdom, formal software inspections, also known as Fagan inspections, can be effectively conducted by teams including members who lack in-depth product knowledge. In fact, the differing viewpoint provided by those without detailed product knowledge may offer a more robust inspection. In this paper we show that an effective inspection can be achieved by a properly prepared and trained team even though they are not familiar with the details of the product or even the domain of the product.

 

Keywords: Inspections, formal inspections, software inspections.

 

Introduction

This paper details the results of our experiences to date on having several teams perform a formal software inspection [Fagan] on the same software product. Our discussion will establish terminology, state our hypothesis, profile the inspectors, describe the training the inspectors received, outline the inspection process used, provide a product overview, state our results to date and discuss the impact of those results.

 

Terminology

To improve readability we use the word "inspection" to mean Fagan style formal software inspections.

 

We use the phrase "control group" to mean those inspectors who had detailed product knowledge prior to performing the inspection. These inspectors were either part of the initial development team or were closely associated with those developing the product.

 

We use the term "study group(s)" to mean one or all of the inspection teams consisting of those people who conducted inspections but did not have any knowledge of the product prior to participating in the inspection.

 

We use the term “semantic defect” to refer to a defect that could result in incorrect program execution.

 


Hypothesis

Formal software inspections were first introduced by Michael Fagan in 1976. In the original [Fagan] and subsequent articles [Ackerman], it is suggested that the people performing the inspection should be either familiar with the product to be inspected or, at a minimum, have knowledge of products similar to that being inspected. In our jobs outside the academic environment we often noted that it was impossible get people familiar with the product to take the time from their busy schedules to prepare for and execute inspections. However, there are often other engineers, typically unfamiliar with the product details, who would be available to perform an inspection. Based on this need to increase the pool of potential inspectors by having inspections performed by people without detailed product knowledge, we began to question the rigidity of the requirement for detailed product knowledge by inspectors. We hypothesized that detailed product knowledge is not required for individuals to be effective members of a software inspection team and we wanted to test this theory.  Furthermore, we wanted to compare the results of an inspection done by a control group to inspections done by teams without detailed product knowledge. Our questions evolved to the following list:

 

     When compared to the control group, what percentage of defects can be found by teams without detailed product knowledge?

 

     What factors contribute to the ability or inability of a team to detect known defects?

 

     Are there any differences between the types of defects found by the control group and those found by the study group?

 

     Are there significant differences in the amount of time spent by either group when preparing for or executing an inspection?

 

Profile of the Inspectors

The control group consisted of engineers averaging 8 years industrial experience, all of whom were attending the Master of Software Engineering program at Carnegie Mellon University.  This group received instruction in formal software inspections as part of their degree program.  The product under inspection was produced as part of a multi-semester project.  The inspection by the control group took place nine months after they received the instruction, by which time they had acquired significant experience in inspecting products such as the one used in this study.

 

The study group(s) consisted predominantly of practicing software professionals enrolled in the graduate computer science curriculum at the University of Massachusetts at Lowell (UML) or the continuing education program at Merrimack College. The UML inspectors were taking the course titled "A Discipline for Software Engineering" [Humphrey 94-1, Humphrey 94-2]. The topic of software inspections was incorporated into this course by the instructor. At Merrimack College, the students were taking "Software Engineering I". Again, the topic of inspections was one element of this course.

 

Process Used

As previously stated, two types of teams conducted the inspections: a control group and a study group. With the minor exceptions noted below, the process used to train, prepare, conduct and report on the inspection was similar for all teams.

 

Training and Inspection Preparation

Both the control group and the study groups were given a 1.5 hour lecture describing inspections, the inspection process, and the roles and responsibilities of each person participating in the inspection. These lectures were given by individuals experienced in the industrial application of the inspection technique. In addition, the control group used a 90 minute interactive computer based instruction (CBI) / video on inspections. This video was being tested by the education department at the Software Engineering Institute and the control group participated in the evaluation of this product.  The study groups, however, watched the two videos supplied with “Materials for the Instruction of Formal Inspections“ [Tomayko] instead of the 90 minute interactive CBI / video. These video tapes covered the same material as the CBI / video tapes.

 

Having completed inspection training, both the control group and study groups were given the overview of the product by either the producer or one coached in the details of the product by the producer. At the conclusion of the overview, each person was given a review package, the contents of which are discussed later.

 

After completion of the overview, and before beginning individual preparation, roles for the inspection meeting were assigned. Table 1 outlines these roles.

 

Moderator:

For the control group, the moderator also acted as a reviewer and reported defects he found. 

For the study groups, the moderator was one of the instructors in the class and did not participate in the defect finding portion of the inspection.

Producer:

In all cases the actual producer filled this role.

Recorder:

We requested a volunteer for this task. The major selection criterion was willingness to write the inspection report.

Reader:

This person was a volunteer, the major selection criteria were willingness to do the job and strong familiarity with the implementation language, C.

Inspectors:

All other people.

 

Table 1 - Roles during inspections

 

The next step was individual preparation. In all cases, the inspection teams were given one week to review the materials.

 


The Actual Inspection

The actual inspection was conducted in a manner similar to that outlined by Fagan. The recorder captured all preparation data, kept track of the defects found, and reported any defects found during his review of the product. The reader paraphrased the code and each person brought forward their issues when that particular range of line numbers was encountered. On occasion the moderator needed to adjust the direction of the meeting when participants changed from finding defects to creating solutions or modifying the basic design. After all lines of code had been examined, the recorder verified the defect data captured by reading it back to the group. Any errors in the data captured were corrected at this time. In addition, the team assigned a severity for each defect found.

 

Reporting

After the inspection, the recorder was responsible for writing the inspection report. Here, time spent and a summary of defects found were documented. In addition, an attachment included the details on all defects found during the inspections.

 

Product Inspected

The software product being inspected was part of a robot being developed for the National Aeronautic and Space Administration (NASA). The purpose of the robot was to inspect the heat shield tiles on the space shuttle. After each mission, approximately 17,000 tiles had to be inspected and injected with a toxic chemical. To minimize worker exposure to this chemical, NASA undertook a proof-of-concept project to develop this robotic maintenance system. At the time the control group inspected the product, it was under development by the Field Robotics Center at Carnegie Mellon University. The robot has since been delivered to NASA [Nordwall].

 

The specific software being inspected was a control module that would read data from a joystick (the joystick was controlled by an operator), convert the joystick data into motor control commands, and send these commands to another processor which would actually drive the robot’s wheels. Additionally, the software needed to respond to operator authorization messages and override messages from other processes.  This software consisted of 525 non-comment source lines of “C” code.

 

Review Package Contents

Each inspection team member was provided a review package which contained the following materials:

            1. A line-numbered source code listing

            2. The design document chapters pertaining to the software being inspected

            3. The relevant section of the requirements document

            4. The coding standard

            5. The appropriate section of the program maintenance document.

 


Conditions of the Inspection

In many respects, the conditions for the study groups were actually worse than might be encountered in an industrial setting. The list of extenuating circumstances follows:

 

            1. During all inspections, the producer was the person who actually wrote the code. Except during the initial inspection by the control group, the producer did not participate in the defect finding activity, but was present to answer questions and provide clarification as needed. Contrast this with an industrial setting where the producer does participate in finding defects. This difference resulted in the burden of defect detection falling squarely on the remaining members of each study group.

 

            2. With the exceptions of the producer and moderator, all members of the study groups had their first training and actual inspection experience during the course being taught. Compare this with typical industrial settings where most inspectors have had inspection training and may have participated in one or more previous inspections.

 

            3. None of the study group members possessed any product knowledge. While our hypothesis is that individuals can effectively contribute to inspections without having detailed product knowledge, forming a team consisting primarily of these individuals may stretch this hypothesis to its limits.

 

Considering all the above conditions, the defect detection data for the study groups are very encouraging.

 

Results to Date

During five inspections, a total of 169 unique defects were identified in the product. The most common type of defect detected (approximately 1/3 of all defects detected to date) was violation of the coding standard. However, of the 169 defects, 8 were classified by the teams as semantic defects.

 

Table 2 and Table 3 below give the defect data for all inspection teams.  Both the total number of defects detected as well as the number of defects matching those found by the control group (coverage) are presented.

 

 

Control

Group

Inspection

Team A

Inspection

Team B

Inspection

Team C

Inspection

Team D

Defects Detected

(Percentage of total defects)

72 (43%)

33 (20%)

53 (31%)

77 (46%)

 34 (20%)

Control Group Coverage

- -

19

16

21

13

 

Table 2 - Number of defects found by all inspection teams.

 


 

 

Control Group

Inspection Team A

Inspection Team B

Inspection Team C

Inspection Team D

Semantic Defects Detected

(Percentage of all Semantic Defects)

5 (63%)

4 (50%)

1 (13%)

4 (50%)

 5 (63%)

Semantic Control Group Coverage

- -

3

1

2

3

 

Table 3 - Number of semantic defects found by all inspection teams.

 

Of the 169 defects detected, many were detected by more than one inspection team. The percentage of defects located by multiple teams decreases as the number of teams is increased. It should be noted, however, that the decrease is somewhat less dramatic when only looking at semantic defects. This indicates that while the inspection teams varied in their detection of individual defects, they located semantic defects with  reasonable consistency.  The distribution of all defects across the inspection teams, including the control group, is shown in Table 4. Table 5 summarizes the distribution for the semantic defects.

               

Number of teams detecting

Number of Defects

Percentage of Total Defects Found

Detected by 1 Team

105

62%

Detected by 2 Teams

44

26%

Detected by 3 Teams

9

5%

Detected by 4 Teams

6

4%

Detected by 5 Teams

5

3%

 

Table 4 - Distribution of total defects found by teams

 

 

Number of teams detecting

Number of Semantic Defects

Percentage of Semantic Defects Found

Detected by 1 Team

3

38%

Detected by 2 Teams

2

25%

Detected by 3 Teams

1

13%

Detected by 4 Teams

1

13%

Detected by 5 Teams

1

13%

 

Table 5 - Distribution of semantic defects found by teams

 

Tables 4 and 5 are interesting data. However, one cannot compare inspection teams with the control group without discussing the effectiveness of the control group. Of the 169 defects reported, the control group missed 97 defects, including three semantic defects.  As such, the control group cannot be used as an absolute indicator of the defects present in the product, but rather simply as a reference point for relative comparisons.

 

Table 6 lists the summary inspection data for all five groups. Note that the study groups spent approximately the same amount of time preparing for the inspections as the control group.  Therefore, it does not appear that the lack of product knowledge increased the amount of preparation time required.


 

 

Control Group

Group A

Group B

Group C

Group D

Team Size

4

7

4

7

4

Total Preparation Time (Hours)

6.25

9.2

5.6

9.5

7.6

Preparation time per person (Hours)

1.56

1.31

1.40

1.35

1.9

Total Defects Found

72

33

53

77

33

Total Defects per hour of prep

11.5

3.6

9.5

8.1

4.3

Total Defects found per KLOC

137

63

101

147

63

Semantic Defects Found

5

4

1

4

5

Semantic Defects per hour of prep

0.8

0.4

0.2

0.4

0.7

Semantic Defects found per KLOC

10

8

2

8

10

 

Table 6 - Summary inspection data (KLOC: thousand lines of code)

 

Analysis and Potential Impact

Analysis of the individual defect descriptions recorded by the inspection teams and the textual descriptions of those defects reveals that nearly every defect detected by the study groups was prompted by a comparison to an “oracle.” The concept of the oracle is simple. If inspectors have something to compare the product against, they can discover differences and report defects or at least raise a question. The oracle can be a coding standard, a design document, a maintenance document, comments in the code, or even other parts of the code itself. In general, if two things do not appear to match, one should suspect a defect.

 

While the sample size used in this study is not statistically significant, an observation can be made regarding a trend in the resulting data.  Groups A and D found the fewest total number of defects, however they did the best job of identifying semantic defects.  Group B on the other hand reported a large number of defects but did not identify as many of the semantic defects.  In fact, Group B reported far more typographical errors and coding standard violations, while Groups A, C and D tended to ignore superficial errors in favor of more significant defects.  Perhaps the use of automated techniques for enforcing coding standards (such as “pretty-printers” and format analyzers) would prevent many of these defects from entering the inspection in the first place.  This would allow teams to remain more closely focused on locating logical and functional defects.

 

As stated earlier, the study groups had no direct product knowledge. Instead, they relied on the product knowledge contained in the supporting materials. The importance of these materials as the oracle adds further incentives for organizations to conduct software development in a defined and controlled manner that produces key artifacts at specific stages of development. For example, let us illustrate the semantic defect found by all five inspection teams. In this case the code acted as an oracle itself.

 

     if ( Return_Value == SUCCESS )

      {

        send_message ( FAIL_MESSAGE );

      }

 

The inspectors concluded that a defect was present because of the apparent contradiction of a “successful” return value resulting in the generation of a failure message.  In fact, the defect is that the comparison operator should be an inequality instead of an equality.  Finding this defect was only possible because the coding standard mandated the use of enumerated types or symbolic constant names instead of literal constants.

 

Other oracles useful in identifying defects included the function header comments (which stated expected pre- and post-conditions) as well as in-line comments. 

 

Some defects found by the control group were missed by the study groups. These defects were generally limited to cases where supporting materials were either incomplete or incorrect.  Therefore, it can be expected that the study groups would miss these defects, as there is no oracle to use.

 

An interesting observation was made when examining those defects missed by the control group. In three cases, the control group failed to detect a dangerous coding practice that all study groups detected. In one case, an integer variable was being compared to a floating-point constant. The producer of the code has identified this as a case of being too familiar with the product under inspection, as this was old code that had not previously malfunctioned. In this instance, the lack of product knowledge allowed the study groups to view the product from a different perspective from that of the control group. We feel this resulted in a more robust evaluation of the product.

 

Plans for the future.

Our plans for the future include the following:

     Gathering additional inspection data on this product.

     Incorporating a defect classification and severity standard into the process and evaluate the results.

     Collecting sufficient data to establish the repeatability of the inspection process.

     Determining which type of oracle is most useful at assisting in identifying defects.

     Evaluating if inspectors trained in the Personal Software Process discipline are more effective than others.

 

Acknowledgments

The authors would like to thank the following people and organizations: Dr. Jim Tomayko of the Master of Software Engineering program at the Software Engineering Institute, Carnegie Mellon University, for teaching us formal software inspections and being an enthusiastic supporter of our idea. Watts Humphrey for asking us good questions. Manuel Rosso-Llopart for creating the coding standard. The students in our classes for having fun watching the video, learning about formal software inspections, and inspecting the code. Celia Menzia for inspecting this paper (without any product knowledge) and finding many more defects than we could have imagined were there.  Finally, the University of Massachusetts at Lowell and Merrimack College for having the vision to offer software engineering courses and allowing us the latitude to develop and adjust them as we saw fit.

 


References

[Ackerman] A. Frank Ackerman, Lynne S. Buchwald, Frank H. Lewski, "Software Inspections: An Effective Verification Process,” IEEE Software, May 1989.

 

[Fagan] M. E. Fagan, "Design and Code Inspections to Reduce Errors in Program Development," IBM Systems Journal, Vol. 15, No. 3, 1976, pp. 182-211.

 

[Humphrey 94-1] Watts S. Humphrey, “A Discipline for Software Engineering,” Addison-Wesley, Reading, Massachusetts, 1995.

 

[Humphrey 94-2] Watts S. Humphrey, "Disciplined Software Engineering Course Plan,” Software Engineering Education Workshop, Sorrento, Italy, May 21, 1994.

 

[Nordwall] Bruce D. Nordwall, “Robot Replacing Humans to Service Shuttle Tiles,” Aviation Week and Space Technology, June 27, 1994, pg. 76.

 

[Tomayko] Dr. James Tomayko and James Murphy, “Materials for the Instruction of Formal Inspections,” Software Engineering Institute (SEI) Academic Series, 1993.

 

©1995 Howard E. Dow

©1995 James S. Murphy