In “Embracing Change with Extreme Programming” (IEEE Computer, 1999), Kent Beck writes:
Some methodologies, like Cleanroom, prohibit programmers testing or in some cases even compiling their own programs.
As evidence of this he cites the text Cleanroom Software Engineering: Technology and Process, by, among others, yours truly. I for one have never said that people should not run unit tests, nor does the above text say that. I emailed Mr. Beck about it, and he confirmed that he did not get this from the cited text. Whew!
Just to be clear, here is my position. I would never tell anyone not to run a test they felt was necessary. I may personally believe there are efficient and inefficient ways to test, but I’m not an expert in every domain. The people who create a product are responsible for the consequences of release of that product, and should act accordingly.
The Purpose of Testing
The following is my philosophical position on software testing.
The purpose of software testing is to gain evidence that software release will not be harmful.
In particular, I believe it is impractical and inefficient to take the position that the purpose of testing is to find bugs. It leads to poor practice, and information that isn’t really “actionable.”
Some may insist that the purpose of testing is specifically to find bugs, but they’re misguided. If you want to find bugs, testing is a poor way to do it. Use inspections or verification. But more on that later.
I’m a late-comer to Cleanroom. Since Harlan Mills introduced the notion to the world in the 1980’s, it’s been an ever-evolving set of practices. I’d say the following are the essential elements of Cleanroom software engineering.
- Emphasize defect prevention rather than defect correction. I’d say this is the fundmental characteristic of Cleanroom.
- Statistical certification of software quality.
- Iterative and incremental development under statistical process control.
- Formal or rigorous specification.
- Stepwise refinement with verification.
Okay, I cheated. Here’s what Harlan Mills, Mike Dyer, and Rick Linger have to say on the subject in “Cleanroom Software Engineering,” IEEE Software, 14(2), 1987.
With the Cleanroom process, you can engineer software under statistical quality control. As with cleanroom hardware development, the process’s first priority is defect prevention rather than defect removal (of course, any defects not prevented should be removed). This first priority is achieved by using human mathematical verification in place of program debugging to prepare software for system test.
Its next priority is to provide valid, statistical certification of the software’s quality through representative-user testing at the system level. The measure of quality is the mean time to failure in appropriate units of time (real or processor time) of the desired product. The certification takes into account the growth of reliability achieved during system testing before delivery.
To gain the benefits of quality control during development, Cleanroom software engineering requires a development cycle of concurrent fabrication and certification of product increments that accumulate into the system to be delivered. This lets the fabrication process be altered on the basis of early certification results to achieve the quality desired.
Unit testing is not mentioned in the article, because the emphasis is on breaking the code-compile-debug cycle.
Prohibiting Unit Testing
So, unit testing. Phil Hausler, Rick Linger, and Carmen Trammell have this to say in “Adopting Cleanroom Software Engineering With a Phased Approach,” IBM Systems Journal, March 1994. (Read it here.)
In traditional, craft-based software development errors were accepted as inevitable, and programmers were encouraged to get software into testing quickly in oder to begin debugging. Programs were subjected to unit testing and debugging by their authors, then integrated into components, subsystems, and systems for more debugging. Product use by customers resulted in still more debugging to correct errors discovered in operational use. The most virulent errors were often the result of fixes to other error, and it was not unusua for software products to reach a steady-state error population, with new errors introduced as fast as old ones were fixed. Today, however, craft-based processes that dependent on testing and debugging to improve reliability are understood to be inefficient and ineffective. Experience has shown that craft-based processes often fail to achieve the level of reliability essential to a society dependent on software for the conduct of human affairs.
They cite the Adams data in support of the above. (E. N. Adams, “Optimizing Preventive Service of Software Products,” IBM Journal of Research and Development, 28(1):2-14, 1984.)
I believe it is clear from the above what sort of practices they are targeting. They go on to write the following.
In the Cleanroom process, correctness is built into the software by development teams through a rigorous engineering process of specification, design, and verification. The more powerful process of team correctness verification replaces unit testing and debugging, and software enters system testing directly, with no execution by development teams. All errors are accounted for from first execution on, with no private unit testing necessary or permitted.
So there you have it; the prohibition against unit testing. This is done to break the cycle of code-compile-test by taking the test and possibly compile phases out of the hands of the developers. And I agree, up to a point. I would even go so far as to claim that Kent Beck might agree, but break the cycle in a different way. He requires you to write the tests first. Then you write the code that passes these tests.
Inspection is Insufficient
Software inspection is the most effective means to find and remove defects. I can say this because the studies all support this. Even informal code reviews are far more effective than testing, when both are done properly.
The fact is that inspections are not enough. Nobody understands the semantics of modern high-level languages. I’m serious; you do not. You may think you do, in which case you are dangerously deluded.
There are three primary reasons for this.
- Language standards are seldom complete or consistent. It is possible, for instance, to have legally constructed C programs whose meaning is not specified by the C standard. Compiler writers are free to make their own decisions.
- Compiler writers make mistakes. Library writers make mistakes.
- Computers are weird. Can you explain the IEEE-754 standard to me, along with the details of the particular implementation you are using? How about the fact that there is one more negative number than positive numbers in the two’s-complement world?
The short of this is that you have to execute the code. You have to run tests. I see nothing wrong with test early, test often. Is the library documentation insufficient? Then you’re going to code up and run some tests, aren’t you?
Even if you work at the assembly level, and don’t worry with compilers and high-level languages, computers are still a bit dodgy. Are “add eax,dword 1” and “inc eax” the same? Many emulators assume they are, but they differ in one particular way. Do you know what it is? The people who write malicious code do.
Testing is insufficient, since you cannot prove the absence of defects, bad behavior, or malicious code by testing. Inspection, as I just argued, is insufficient for a number of reasons.
The answer is verification. Verification is challenging, mathematical, and requires a precise model of the system. Fortunately, automated verification is on its way. By creating a formal model of the processor, warts and all, and then by computing the functional behavior of code, one can achieve results that cannot be achieved by either testing or inspection.
Automated verification is essential. Manual verification is simply not practical, thanks to the complexity of modern processors. I am capable of proving the correctness of an algorithm, but then it is implemented on a finite precision machine with lots of non-linear and discontinuous behavior, and all bets are off. I would argue that humans are simply not good enough at the bookkeeping to do the job for large software systems. Computers are.
Projects like CERT’s Function Extraction (FX) effort may finally deliver on the promise of software verification. Until then, you’ll just have to run your tests.
Do It Correctly
Here’s my advice on unit testing. It’s short. Feel free to extend and revise it as necessary.
- Have someone other than the programmer write the unit test.
- Test the interface. That is, write the test as a user of the module, not the developer. The same could be said for documentation.
- Test the GUI. It isn’t as hard as you think it is.
- Write your tests to generate random values and also test boundary values.
- For stateful modules, write tests that generate sequences of inputs.