Detailed
Product Knowledge is not a Prerequisite for an Effective Formal Software
Inspection
Howard E. Dow Adjunct Faculty Department of Computer Science University of Massachusetts at Lowell Lowell, MA 01854 |
James S. Murphy Adjunct Lecturer Division of Continuing Education Merrimack College Andover, MA 01845 |
Abstract: Contrary to common wisdom,
formal software inspections, also known as Fagan inspections, can be
effectively conducted by teams including members who lack in-depth product knowledge.
In fact, the differing viewpoint provided by those without detailed product
knowledge may offer a more robust inspection. In this paper we show that an
effective inspection can be achieved by a properly prepared and trained team
even though they are not familiar with the details of the product or even the
domain of the product.
Keywords: Inspections, formal
inspections, software inspections.
Introduction
This
paper details the results of our experiences to date on having several teams
perform a formal software inspection [Fagan] on the same software product. Our
discussion will establish terminology, state our hypothesis, profile the
inspectors, describe the training the inspectors received, outline the
inspection process used, provide a product overview, state our results to date
and discuss the impact of those results.
Terminology
To
improve readability we use the word "inspection" to mean Fagan style
formal software inspections.
We
use the phrase "control group" to mean those inspectors who had
detailed product knowledge prior to performing the inspection. These inspectors
were either part of the initial development team or were closely associated
with those developing the product.
We
use the term "study group(s)" to mean one or all of the inspection
teams consisting of those people who conducted inspections but did not have any
knowledge of the product prior to participating in the inspection.
We
use the term “semantic defect” to refer to a defect that could result in
incorrect program execution.
Hypothesis
Formal
software inspections were first introduced by Michael Fagan in 1976. In the
original [Fagan] and subsequent articles [Ackerman], it is suggested that the
people performing the inspection should be either familiar with the product to
be inspected or, at a minimum, have knowledge of products similar to that being
inspected. In our jobs outside the academic environment we often noted that it
was impossible get people familiar with the product to take the time from their
busy schedules to prepare for and execute inspections. However, there are often
other engineers, typically unfamiliar with the product details, who would be
available to perform an inspection. Based on this need to increase the pool of
potential inspectors by having inspections performed by people without detailed
product knowledge, we began to question the rigidity of the requirement for
detailed product knowledge by inspectors. We hypothesized that detailed product
knowledge is not required for individuals to be effective members of a software
inspection team and we wanted to test this theory. Furthermore, we wanted to compare the results of an inspection
done by a control group to inspections done by teams without detailed product
knowledge. Our questions evolved to the following list:
When compared to the control
group, what percentage of defects can be found by teams without detailed
product knowledge?
What factors contribute to the
ability or inability of a team to detect known defects?
Are there any differences
between the types of defects found by the control group and those found by the
study group?
Are there significant
differences in the amount of time spent by either group when preparing for or
executing an inspection?
Profile of the Inspectors
The
control group consisted of engineers averaging 8 years industrial experience,
all of whom were attending the Master of Software Engineering program at
Carnegie Mellon University. This group
received instruction in formal software inspections as part of their degree
program. The product under inspection
was produced as part of a multi-semester project. The inspection by the control group took place nine months after
they received the instruction, by which time they had acquired significant
experience in inspecting products such as the one used in this study.
The
study group(s) consisted predominantly of practicing software professionals
enrolled in the graduate computer science curriculum at the University of
Massachusetts at Lowell (UML) or the continuing education program at Merrimack
College. The UML inspectors were taking the course titled "A Discipline for
Software Engineering" [Humphrey 94-1, Humphrey 94-2]. The topic of
software inspections was incorporated into this course by the instructor. At
Merrimack College, the students were taking "Software Engineering I".
Again, the topic of inspections was one element of this course.
Process Used
As
previously stated, two types of teams conducted the inspections: a control
group and a study group. With the minor exceptions noted below, the process
used to train, prepare, conduct and report on the inspection was similar for
all teams.
Training and Inspection
Preparation
Both
the control group and the study groups were given a 1.5 hour lecture describing
inspections, the inspection process, and the roles and responsibilities of each
person participating in the inspection. These lectures were given by
individuals experienced in the industrial application of the inspection
technique. In addition, the control group used a 90 minute interactive computer
based instruction (CBI) / video on inspections. This video was being tested by
the education department at the Software Engineering Institute and the control
group participated in the evaluation of this product. The study groups, however, watched the two videos supplied with “Materials
for the Instruction of Formal Inspections“ [Tomayko] instead of the 90 minute
interactive CBI / video. These video tapes covered the same material as the CBI
/ video tapes.
Having
completed inspection training, both the control group and study groups were
given the overview of the product by either the producer or one coached in the
details of the product by the producer. At the conclusion of the overview, each
person was given a review package, the contents of which are discussed later.
After
completion of the overview, and before beginning individual preparation, roles
for the inspection meeting were assigned. Table 1 outlines these roles.
Moderator: |
For the control group, the moderator also acted as a reviewer and reported defects he found. For the study groups, the moderator was one of the instructors in the class and did not participate in the defect finding portion of the inspection. |
Producer: |
In all cases the actual producer filled this role. |
Recorder: |
We requested a volunteer for this task. The major selection criterion was willingness to write the inspection report. |
Reader: |
This person was a volunteer, the major selection criteria were willingness to do the job and strong familiarity with the implementation language, C. |
Inspectors: |
All other people. |
Table 1 -
Roles during inspections
The
next step was individual preparation. In all cases, the inspection teams were
given one week to review the materials.
The Actual Inspection
The
actual inspection was conducted in a manner similar to that outlined by Fagan.
The recorder captured all preparation data, kept track of the defects found,
and reported any defects found during his review of the product. The reader
paraphrased the code and each person brought forward their issues when that
particular range of line numbers was encountered. On occasion the moderator
needed to adjust the direction of the meeting when participants changed from
finding defects to creating solutions or modifying the basic design. After all
lines of code had been examined, the recorder verified the defect data captured
by reading it back to the group. Any errors in the data captured were corrected
at this time. In addition, the team assigned a severity for each defect found.
Reporting
After
the inspection, the recorder was responsible for writing the inspection report.
Here, time spent and a summary of defects found were documented. In addition, an
attachment included the details on all defects found during the inspections.
Product Inspected
The
software product being inspected was part of a robot being developed for the
National Aeronautic and Space Administration (NASA). The purpose of the robot
was to inspect the heat shield tiles on the space shuttle. After each mission,
approximately 17,000 tiles had to be inspected and injected with a toxic
chemical. To minimize worker exposure to this chemical, NASA undertook a
proof-of-concept project to develop this robotic maintenance system. At the
time the control group inspected the product, it was under development by the
Field Robotics Center at Carnegie Mellon University. The robot has since been
delivered to NASA [Nordwall].
The
specific software being inspected was a control module that would read data
from a joystick (the joystick was controlled by an operator), convert the
joystick data into motor control commands, and send these commands to another
processor which would actually drive the robot’s wheels. Additionally, the
software needed to respond to operator authorization messages and override
messages from other processes. This
software consisted of 525 non-comment source lines of “C” code.
Review Package Contents
Each
inspection team member was provided a review package which contained the
following materials:
1. A line-numbered source code
listing
2. The design document chapters
pertaining to the software being inspected
3. The relevant section of the
requirements document
4. The coding standard
5. The appropriate section of the
program maintenance document.
Conditions of the Inspection
In
many respects, the conditions for the study groups were actually worse than
might be encountered in an industrial setting. The list of extenuating
circumstances follows:
1. During all inspections, the
producer was the person who actually wrote the code. Except during the initial
inspection by the control group, the producer did not participate in the defect
finding activity, but was present to answer questions and provide clarification
as needed. Contrast this with an industrial setting where the producer does
participate in finding defects. This difference resulted in the burden of
defect detection falling squarely on the remaining members of each study group.
2. With the exceptions of the
producer and moderator, all members of the study groups had their first
training and actual inspection experience during the course being taught.
Compare this with typical industrial settings where most inspectors have had
inspection training and may have participated in one or more previous
inspections.
3. None of the study group members
possessed any product knowledge. While our hypothesis is that individuals can
effectively contribute to inspections without having detailed product
knowledge, forming a team consisting primarily of these individuals may stretch
this hypothesis to its limits.
Considering
all the above conditions, the defect detection data for the study groups are
very encouraging.
Results to Date
During
five inspections, a total of 169 unique defects were identified in the product.
The most common type of defect detected (approximately 1/3 of all defects
detected to date) was violation of the coding standard. However, of the 169
defects, 8 were classified by the teams as semantic defects.
Table
2 and Table 3 below give the defect data for all inspection teams. Both the total number of defects detected as
well as the number of defects matching those found by the control group (coverage)
are presented.
|
Control Group |
Inspection Team A |
Inspection Team B |
Inspection Team C |
Inspection Team D |
Defects Detected (Percentage of total defects) |
72 (43%) |
33 (20%) |
53 (31%) |
77 (46%) |
34 (20%) |
Control Group Coverage |
- - |
19 |
16 |
21 |
13 |
Table 2 -
Number of defects found by all inspection teams.
|
Control Group |
Inspection Team A |
Inspection Team B |
Inspection Team C |
Inspection Team D |
Semantic Defects Detected (Percentage of all Semantic Defects) |
5 (63%) |
4 (50%) |
1 (13%) |
4 (50%) |
5 (63%) |
Semantic Control Group Coverage |
- - |
3 |
1 |
2 |
3 |
Table 3 -
Number of semantic defects found by all inspection teams.
Of the
169 defects detected, many were detected by more than one inspection team. The
percentage of defects located by multiple teams decreases as the number of
teams is increased. It should be noted, however, that the decrease is somewhat
less dramatic when only looking at semantic defects. This indicates that while
the inspection teams varied in their detection of individual defects, they
located semantic defects with
reasonable consistency. The
distribution of all defects across the inspection teams, including the control
group, is shown in Table 4. Table 5 summarizes the distribution for the
semantic defects.
Number of teams
detecting |
Number of Defects |
Percentage of Total Defects Found |
Detected by 1 Team |
105 |
62% |
Detected by 2 Teams |
44 |
26% |
Detected by 3 Teams |
9 |
5% |
Detected by 4 Teams |
6 |
4% |
Detected by 5 Teams |
5 |
3% |
Table 4 -
Distribution of total defects found by teams
Number of teams
detecting |
Number of Semantic Defects |
Percentage of Semantic Defects Found |
Detected by 1 Team |
3 |
38% |
Detected by 2 Teams |
2 |
25% |
Detected by 3 Teams |
1 |
13% |
Detected by 4 Teams |
1 |
13% |
Detected by 5 Teams |
1 |
13% |
Table 5 - Distribution
of semantic defects found by teams
Tables
4 and 5 are interesting data. However, one cannot compare inspection teams with
the control group without discussing the effectiveness of the control group. Of
the 169 defects reported, the control group missed 97 defects, including three
semantic defects. As such, the control
group cannot be used as an absolute indicator of the defects present in the
product, but rather simply as a reference point for relative comparisons.
Table
6 lists the summary inspection data for all five groups. Note that the study
groups spent approximately the same amount of time preparing for the
inspections as the control group.
Therefore, it does not appear that the lack of product knowledge
increased the amount of preparation time required.
|
Control Group |
Group A |
Group B |
Group C |
Group D |
Team Size |
4 |
7 |
4 |
7 |
4 |
Total Preparation Time (Hours) |
6.25 |
9.2 |
5.6 |
9.5 |
7.6 |
Preparation time per person (Hours) |
1.56 |
1.31 |
1.40 |
1.35 |
1.9 |
Total Defects Found |
72 |
33 |
53 |
77 |
33 |
Total Defects per hour of prep |
11.5 |
3.6 |
9.5 |
8.1 |
4.3 |
Total Defects found per KLOC |
137 |
63 |
101 |
147 |
63 |
Semantic Defects Found |
5 |
4 |
1 |
4 |
5 |
Semantic Defects per hour of prep |
0.8 |
0.4 |
0.2 |
0.4 |
0.7 |
Semantic Defects found per KLOC |
10 |
8 |
2 |
8 |
10 |
Table 6 -
Summary inspection data (KLOC: thousand lines of code)
Analysis and Potential
Impact
Analysis
of the individual defect descriptions recorded by the inspection teams and the
textual descriptions of those defects reveals that nearly every defect detected
by the study groups was prompted by a comparison to an “oracle.” The concept of
the oracle is simple. If inspectors have something to compare the product
against, they can discover differences and report defects or at least raise a
question. The oracle can be a coding standard, a design document, a maintenance
document, comments in the code, or even other parts of the code itself. In
general, if two things do not appear to match, one should suspect a defect.
While
the sample size used in this study is not statistically significant, an
observation can be made regarding a trend in the resulting data. Groups A and D found the fewest total number
of defects, however they did the best job of identifying semantic defects. Group B on the other hand reported a large
number of defects but did not identify as many of the semantic defects. In fact, Group B reported far more
typographical errors and coding standard violations, while Groups A, C and D
tended to ignore superficial errors in favor of more significant defects. Perhaps the use of automated techniques for
enforcing coding standards (such as “pretty-printers” and format analyzers)
would prevent many of these defects from entering the inspection in the first
place. This would allow teams to remain
more closely focused on locating logical and functional defects.
As
stated earlier, the study groups had no direct product knowledge. Instead, they
relied on the product knowledge contained in the supporting materials. The
importance of these materials as the oracle adds further incentives for
organizations to conduct software development in a defined and controlled
manner that produces key artifacts at specific stages of development. For
example, let us illustrate the semantic defect found by all five inspection
teams. In this case the code acted as an oracle itself.
if
( Return_Value == SUCCESS )
{
send_message ( FAIL_MESSAGE );
}
The
inspectors concluded that a defect was present because of the apparent
contradiction of a “successful” return value resulting in the generation of a
failure message. In fact, the defect is
that the comparison operator should be an inequality instead of an
equality. Finding this defect was only
possible because the coding standard mandated the use of enumerated types or
symbolic constant names instead of literal constants.
Other
oracles useful in identifying defects included the function header comments
(which stated expected pre- and post-conditions) as well as in-line
comments.
Some
defects found by the control group were missed by the study groups. These
defects were generally limited to cases where supporting materials were either
incomplete or incorrect. Therefore, it
can be expected that the study groups would miss these defects, as there is no
oracle to use.
An
interesting observation was made when examining those defects missed by the
control group. In three cases, the control group failed to detect a dangerous
coding practice that all study groups detected. In one case, an integer
variable was being compared to a floating-point constant. The producer of the
code has identified this as a case of being too familiar with the product under
inspection, as this was old code that had not previously malfunctioned. In this
instance, the lack of product knowledge allowed the study groups to view the
product from a different perspective from that of the control group. We feel
this resulted in a more robust evaluation of the product.
Plans for the future.
Our
plans for the future include the following:
Gathering additional
inspection data on this product.
Incorporating a defect
classification and severity standard into the process and evaluate the results.
Collecting sufficient data to
establish the repeatability of the inspection process.
Determining which type of
oracle is most useful at assisting in identifying defects.
Evaluating if inspectors
trained in the Personal Software Process discipline are more effective than
others.
Acknowledgments
The
authors would like to thank the following people and organizations: Dr. Jim
Tomayko of the Master of Software Engineering program at the Software
Engineering Institute, Carnegie Mellon University, for teaching us formal
software inspections and being an enthusiastic supporter of our idea. Watts
Humphrey for asking us good questions. Manuel Rosso-Llopart for creating the
coding standard. The students in our classes for having fun watching the video,
learning about formal software inspections, and inspecting the code. Celia
Menzia for inspecting this paper (without any product knowledge) and finding
many more defects than we could have imagined were there. Finally, the University of Massachusetts at
Lowell and Merrimack College for having the vision to offer software
engineering courses and allowing us the latitude to develop and adjust them as
we saw fit.
References
[Ackerman] A. Frank Ackerman, Lynne S. Buchwald, Frank H. Lewski, "Software Inspections: An Effective Verification Process,” IEEE Software, May 1989.
[Fagan] M. E. Fagan, "Design and Code Inspections to Reduce Errors in Program Development," IBM Systems Journal, Vol. 15, No. 3, 1976, pp. 182-211.
[Humphrey 94-1] Watts S. Humphrey, “A Discipline for Software Engineering,” Addison-Wesley, Reading, Massachusetts, 1995.
[Humphrey 94-2] Watts S. Humphrey, "Disciplined Software Engineering Course Plan,” Software Engineering Education Workshop, Sorrento, Italy, May 21, 1994.
[Nordwall] Bruce D. Nordwall, “Robot Replacing Humans to Service Shuttle Tiles,” Aviation Week and Space Technology, June 27, 1994, pg. 76.
[Tomayko] Dr. James Tomayko and James Murphy, “Materials for the Instruction of Formal Inspections,” Software Engineering Institute (SEI) Academic Series, 1993.
©1995 Howard E. Dow
©1995 James S. Murphy