Encapsulating Your Way To Better Unit Tests

I an earlier blog, I discussed the qualities of good unit tests. Primarily, good unit tests are fast, isolate the bugs, repeatable, self-validating and timely.

At first blush, that seems easier said than done. We write distributed software and connect to databases. Some of those databases live in another country.

So, how can an object that sets up connections to a database, prime it with test data, consume that data, manipulate it and then present it in a UI actually be fast, isolating, repeatable and self-validating? If by fast we mean microseconds per test (that’s exactly what we mean), we’re toast as soon as we even talk to the database. How do we isolate our tests when we write to a persistent store, while another test is running in parallel against the same store? What happens when the database fails for some reason, and the data has been partially changed? Is the error in our code, or the database? And self-validating? Come on! Our UI’s need to be poked, prodded and looked at to be tested.

Unit-testing Nirvana sounds like a ton of work.

Actually, it’s the work we already do.

Most of us use object oriented languages because we buy into the idea of encapsulation — or, said another way, we want to reduce couplings to particular implementations. Even if polymorphism and inheritance doesn’t float your boat, old-fashioned structured design pushes this same idea. We simply hide (encapsulate) implementation behind function calls, so we can focus on the work at hand.

For example, assume we have a window where individual elements within it may be visible or invisible based on user preferences. What is it about the algorithm to determine the initial visibility for elements of a window that requires a database or a .config file? Nothing, right? Either will approach work. So will other changes in the underlying program. All the algorithm really cares about is it gets configuration data from somewhere. So we should encapsulate that concept.

We know our window needs to be initialized with some configuration data. The window doesn’t really care where it came from. A candidate method might look something like this:

class MyWindow : System.Windows.Forms.Form
{

     // a list of our window controls that we can enable or disable visibility on.
     private List<Control> _toggleVisibility;

    // ... blah blah
    public void SetInitialWindowVisibility(IDictionary<string , bool> visibilities)
    {
        foreach (var control in _toggleVisibility)
        {
            if (visibilities.TryGetValue(control.Name, out visible))
            {
                control.Visible = visible;
            }
        }
    }
}

We honor the Single Responsibility Principle: No database involved. No System.Configuration.ConfigurationManager. If high cohesion and low coupling are the benchmarks of good design, then this is pretty decent.

Here’s the payoff — this approach gave us a good unit-testable design for free:

  • it’s fast — for our test, we simply pass a preloaded dictionary and verify the window controls’ visibilities match the dictionary state
  • it isolates the error — if the unit-test fails here, it’s because the visibility setting behavior was broken. Period. Not a database, not a parser — just the method under test.
  • It’s repeatable — every time we run it, we get the same results — noone on the outside can poison the test during it’s run,
  • It’s self-validating — our unit test need only iterate over the embedded controls collection comparing the state of the control.Visible value. No popping up the user-interface to hand inspect is necessary.

Sure, that configuration dictionary needs to be initialized somewhere, and that code should be tested as well. But if it connects to a database to get the information, keep the test for that code outside the body of unit-tests (recall, unit tests should be run on the developers’ machines, often. If they aren’t fast, they won’t be run at all). Definitely include database tests in the an integration or smoke test, though — those are run with much lower frequency, so they can afford to take a bit more time to run.

For the adventurous, you can raise the bar even more – what about this algorithm requires that it be implemented in a System.Windows.Forms.Window derivative? Can the implementation be refactored to remove this dependency?

Qualities of Good Unit Tests

So — you’ve been tasked with insuring your code has unit tests. What does that mean, really?

According to Wikipedia,

unit testing is a software design and development method where the programmer gains confidence that individual units of source code are fit for use. A unit is the smallest testable part of an application. In procedural programming a unit may be an individual program, function, procedure, etc., while in object-oriented programming, the smallest unit is a method, which may belong to a base/super class, abstract class or derived/child class. — http://en.wikipedia.org/wiki/Unit_testing

If you understand the primary beneficiary of unit tests are the developers, then it make sense we should make our unit tests as helpful to developers as we possibly can.

Brett Schuchert and Tim Ottinger, old hands at unit testing, provide some pointers. They describe good unit tests as being “FIRST”:

  • Fast

    If a developer hesitates to run unit tests because they take a long time to run, the tests won’t be run at all. Unit tests need to be run often (as often as you build – the compiler can only check for correct syntax, the tests insure you got the semantics right), so they need to be cheap and easy to run. One test should take a small fraction of a second. A test suite that takes longer than a few seconds to run is painfully far too slow.

  • Isolated

    This actually means two things:

    • Unit tests should be small enough that the bug is obvious as soon as the test is viewed. A unit test tests one feature of one class.
    • Unit tests should never depend on the order the tests are run – if a test breaks because some other test did or did not successfully run before, then the test can no longer be run in isolation
  • Repeatable

    Tests must be able to be run over and over without any intervention. No hand-loading of a database or editing configuration files. They run whether there is a network or not. Unit tests do not test external systems.

  • Self Validating
    It must be obvious whether a test passes or fails. We never have to interpret results in a database or a file somewhere.
  • Timely
    Tests are written at the right time. Many argue the right time is immediately before the code being tested is written (I’m usually one of them). Regardless, unit tests should always be written before the tested code is checked into source control.

Michael Feathers (the author of one of the original CppUnit unit testing frameworks) also describes some features that should never be in a unit test. He states a test is not a unit test if:

  • It talks to the database (not fast, isolated nor repeatable)
  • It communicates across the network (not fast, isolated nor repeatable)
  • It touches the file system (not fast, isolated nor repeatable)
  • It can’t run at the same time as any of your other unit tests (not isolated or repeatable)
  • You have to do special things to your environment (such as editing config files) to run it. (not fast, isolated nor repeatable)
  • He goes on to say:

    Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.

    So — are your unit tests FIRST or are they painful?

Why Unit Test?

Nobody changes until the pain of staying the same becomes greater than the pain of change ~ Anonymous

If the code I come across in my work is any example, most developers haven’t drank the Unit Testing Kool-Aide. Given the old saw about change and pain, I can almost understand. Almost.

So, why should I unit test?

Let’s answer that question by understanding who benefits from them. Just like any other software we write, when we understand who we write our software for, we’re more likely to write software that will serve them best.

Unit Tests are for programmers. If you write code, they are for you.

Big Deal

Sure, unit tests help demonstrate a body of code does what it should. So does white box testing, functional testing, and integration testing. Really, wouldn’t our time be better served stepping through the code for some test cases and a sample app or two, and letting those fiends in QA figure out ways to break our code?

In reality, verifying your code works is just one benefit of unit tests. There are other, further reaching benefits, when you realize the where the bulk of software development cost really lies:

  • It documents our code in a way that comments could never hope to match
  • It protects dependent code from future breaking changes
  • It forces us to decrease coupling (improving our designs – even more so if you take unit testing all the way to TDD)
  • Helps us uncover (and prevent) bugs earlier and faster (it beats the pants off of using the debugger to find a break).

Note those benefits are not about insuring the tested code works right now. It’s also about insuring our code won’t break tomorrow. It’s about insuring the newly hired developer understands what the code does so she doesn’t accidentally break it with a “fix”. It’s about when the new developer still doesn’t understand, the tests will complain before your customers can.

Profoundly Better Code Documentation

Most .NET programmers have seen structured XML used to document code. Likely, we’ve filled out several <summary>, <return> and <param> tags. Maybe several others. The intent behind them (and coding standards mandating them) is to insure our code is documented enough so the poor slobs who come after us know how to use it.

The problem here is a simple one — comments are pathological liars. They get out of sync with the actual behavior of our code. They often don’t reveal enough about the contract of the code. And when they do, they certainly can’t enforce what is documented (does that <param> accept null values? What happens if we pass a null there? What is the valid range for that integer <param>? What happens to that Stream when there is no data to write? )

Because comments aren’t executable, they have no way to enforce correct behavior. Unit Tests, on hte other hand, demonstrate exactly how to use a bit of code. Because they are executable, they simply cannot get out of sync with the code. Proper unit tests verify an exception is thrown when an invalid argument is supplied; that an integer is in range; what happens to a stream passed to a function.

Protection Against Breaking Changes

Let’s say you’ve been assigned a task that involves adding a feature. You find a class that does 90% of what you need it to do, so you make your changes. Your change doesn’t change the signature of a method, but it does modify the semantics of a parameter just a teeny, tiny bit. You build the code and sure enough, the compiler is happy. Of course the compiler is happy – you didn’t modify the calling syntax – merely the semantics.

So you run the tests and they break. You look at the breaking tests and find they were checking for that very semantic change. Turns out there are several assemblies that depend on the original behavior. If you had tested only your change, you’d never know you broke other code until the bug reports started pouring in. The unit tests caught that before your users could.

A unit test from a previous release demonstrates behavior that code out there somewhere depends on. It’s a warning you should heed.

Improves your Designs with Decreased Coupling
Wikipedia says:

A unit is the smallest testable part of an application. In procedural programming a unit may be an individual program, function, procedure, etc., while in object-oriented programming, the smallest unit is a method, which may belong to a base/super class, abstract class or derived/child class. — http://en.wikipedia.org/wiki/Unit_testing

Comp.software-eng.testing FAQ says:

A unit typically … does not include any called sub-components (for procedural languages) or communicating components in general.

Unit Testing: in unit testing called components (or communicating components) are replaced with stubs, simulators, or trusted components. Calling components are replaced with drivers or trusted super-components. The unit is tested in isolation.

Isolation is a big deal in unit testing. In almost every case, the only thing that should be tested in a unit test is one method of one class. All other participants in that method are stubs, simulators, and test-drivers. The only way you can get to that is by reducing your coupling. Unit tests are much easier to write if your classes depend on abstractions instead of concretions. You’ll also discover you simply cannot unit test a class that depends on the behavior of globals, singletons and most static members – no way, no how.

Test Driven Design (TDD) – not the same thing as unit testing, but a subset of ways to apply it — helps improve designs even further by forcing you to think about your code the way a consumer of your code would. As a result, our APIs are no longer defined by how something is done (leaky abstractions), instead they are defined by what our users want done, which then constrains our implementation (the TDD’s affect on unit testing is such a big subject it warrants its own post).

Better Tests
Finally, unit tests insure the quality of our code in a way no other tests can. All non-unit tests can only insure how a body of code behaves in ways the calling code knows how to call. Consider a function that counts words in a text stream. If that function is currently used only in a program that always supplies text streams of 2KB in size, and there are no unit tests for that function, then the only behavior we can verify for that function is that it can count words in text streams of no smaller or larger than 2KB of data, because the calling code can only handle streams of 2KB of data.

We simply don’t know what will happen if we get 0 bytes, 2047 bytes, 2049 bytes or 2MB of data.

Now, our requirements change, and we must consume just over 3K of data. Our functional tests start blowing up. We can either work through the call stack looking for the issue (let’s say it was an off by one error in our original function that used a 2KB buffer with a zero byte terminator … oops), or we could have had unit tests that leveraged the knowledge our word counting function used a buffer of exactly 2KB in size, so we could write edge case tests around that value).

In upcoming blogs, I’ll cover some ways to maximize the benefit of unit tests, while minimizing the pain.

Powered by WordPress with GimpStyle Theme design by Horacio Bella.
Entries and comments feeds. Valid XHTML and CSS.