Why Unit Test?

Nobody changes until the pain of staying the same becomes greater than the pain of change ~ Anonymous

If the code I come across in my work is any example, most developers haven’t drank the Unit Testing Kool-Aide. Given the old saw about change and pain, I can almost understand. Almost.

So, why should I unit test?

Let’s answer that question by understanding who benefits from them. Just like any other software we write, when we understand who we write our software for, we’re more likely to write software that will serve them best.

Unit Tests are for programmers. If you write code, they are for you.

Big Deal

Sure, unit tests help demonstrate a body of code does what it should. So does white box testing, functional testing, and integration testing. Really, wouldn’t our time be better served stepping through the code for some test cases and a sample app or two, and letting those fiends in QA figure out ways to break our code?

In reality, verifying your code works is just one benefit of unit tests. There are other, further reaching benefits, when you realize the where the bulk of software development cost really lies:

  • It documents our code in a way that comments could never hope to match
  • It protects dependent code from future breaking changes
  • It forces us to decrease coupling (improving our designs – even more so if you take unit testing all the way to TDD)
  • Helps us uncover (and prevent) bugs earlier and faster (it beats the pants off of using the debugger to find a break).

Note those benefits are not about insuring the tested code works right now. It’s also about insuring our code won’t break tomorrow. It’s about insuring the newly hired developer understands what the code does so she doesn’t accidentally break it with a “fix”. It’s about when the new developer still doesn’t understand, the tests will complain before your customers can.

Profoundly Better Code Documentation

Most .NET programmers have seen structured XML used to document code. Likely, we’ve filled out several <summary>, <return> and <param> tags. Maybe several others. The intent behind them (and coding standards mandating them) is to insure our code is documented enough so the poor slobs who come after us know how to use it.

The problem here is a simple one — comments are pathological liars. They get out of sync with the actual behavior of our code. They often don’t reveal enough about the contract of the code. And when they do, they certainly can’t enforce what is documented (does that <param> accept null values? What happens if we pass a null there? What is the valid range for that integer <param>? What happens to that Stream when there is no data to write? )

Because comments aren’t executable, they have no way to enforce correct behavior. Unit Tests, on hte other hand, demonstrate exactly how to use a bit of code. Because they are executable, they simply cannot get out of sync with the code. Proper unit tests verify an exception is thrown when an invalid argument is supplied; that an integer is in range; what happens to a stream passed to a function.

Protection Against Breaking Changes

Let’s say you’ve been assigned a task that involves adding a feature. You find a class that does 90% of what you need it to do, so you make your changes. Your change doesn’t change the signature of a method, but it does modify the semantics of a parameter just a teeny, tiny bit. You build the code and sure enough, the compiler is happy. Of course the compiler is happy – you didn’t modify the calling syntax – merely the semantics.

So you run the tests and they break. You look at the breaking tests and find they were checking for that very semantic change. Turns out there are several assemblies that depend on the original behavior. If you had tested only your change, you’d never know you broke other code until the bug reports started pouring in. The unit tests caught that before your users could.

A unit test from a previous release demonstrates behavior that code out there somewhere depends on. It’s a warning you should heed.

Improves your Designs with Decreased Coupling
Wikipedia says:

A unit is the smallest testable part of an application. In procedural programming a unit may be an individual program, function, procedure, etc., while in object-oriented programming, the smallest unit is a method, which may belong to a base/super class, abstract class or derived/child class. — http://en.wikipedia.org/wiki/Unit_testing

Comp.software-eng.testing FAQ says:

A unit typically … does not include any called sub-components (for procedural languages) or communicating components in general.

Unit Testing: in unit testing called components (or communicating components) are replaced with stubs, simulators, or trusted components. Calling components are replaced with drivers or trusted super-components. The unit is tested in isolation.

Isolation is a big deal in unit testing. In almost every case, the only thing that should be tested in a unit test is one method of one class. All other participants in that method are stubs, simulators, and test-drivers. The only way you can get to that is by reducing your coupling. Unit tests are much easier to write if your classes depend on abstractions instead of concretions. You’ll also discover you simply cannot unit test a class that depends on the behavior of globals, singletons and most static members – no way, no how.

Test Driven Design (TDD) – not the same thing as unit testing, but a subset of ways to apply it — helps improve designs even further by forcing you to think about your code the way a consumer of your code would. As a result, our APIs are no longer defined by how something is done (leaky abstractions), instead they are defined by what our users want done, which then constrains our implementation (the TDD’s affect on unit testing is such a big subject it warrants its own post).

Better Tests
Finally, unit tests insure the quality of our code in a way no other tests can. All non-unit tests can only insure how a body of code behaves in ways the calling code knows how to call. Consider a function that counts words in a text stream. If that function is currently used only in a program that always supplies text streams of 2KB in size, and there are no unit tests for that function, then the only behavior we can verify for that function is that it can count words in text streams of no smaller or larger than 2KB of data, because the calling code can only handle streams of 2KB of data.

We simply don’t know what will happen if we get 0 bytes, 2047 bytes, 2049 bytes or 2MB of data.

Now, our requirements change, and we must consume just over 3K of data. Our functional tests start blowing up. We can either work through the call stack looking for the issue (let’s say it was an off by one error in our original function that used a 2KB buffer with a zero byte terminator … oops), or we could have had unit tests that leveraged the knowledge our word counting function used a buffer of exactly 2KB in size, so we could write edge case tests around that value).

In upcoming blogs, I’ll cover some ways to maximize the benefit of unit tests, while minimizing the pain.

Is Your Code SOLID: The Dependency Inversion Principle

The mother of all SOLID principles. Nail this one, and you’ll keep your codebase supple — ready for just about any change you throw at it. The Dependency Inversion Principle comes in two flavors:

  • HIGH LEVEL MODULES SHOULD NOT DEPEND UPON LOW LEVEL MODULES. BOTH SHOULD DEPEND UPON ABSTRACTIONS.
  • ABSTRACTIONS SHOULD NOT DEPEND UPON DETAILS. DETAILS SHOULD DEPEND UPON ABSTRACTIONS.

An example should make this clear.

Consider a very simple requirement: read a serial port for ASCII characters, find any appropriate stock symbol and associated data and copy that data onto a TCP/IP socket (don’t laugh – those were real requirements: a trading partner’s trade desk wanted a connection to a former employer’s trading platform, and the exchange had no more ports to grant).

The MacGyver’ed up solution ended up with a dependency graph that looked like this:

More abstractly, this program reads characters from a source (the serial port), does a lookup, and then writes textual data to a destination (a socket). A reasonable one-off, but ultimately a problem. Because the high level module (symbol lookup) depended on low level modules (an RS232 serial port reader and a socket), we could only use it in an environment with an RS232 port and a socket. Any other use would require so much refactoring a rewrite actually started to make sense. It was a very simple hack … err … design that became very expensive to maintain.

Instead, if the symbol lookup module depended on a simple text reader abstraction, which the RS232 Serial Port reader then implemented, and the socket implemented a simple text writer abstraction, we suddenly open all the modules up to a world of other uses and environments:

Improved dependency graph

With this change, all the classes follow the OCP. The behavior of the program is changed by replacing the RS232 Serial Port Reader with some other character reader (i.e., and extension) – no other classes need to change. As long as our line-of-business module (symbol lookup) depends on abstractions (TextReader and TextWriter), the data will flow no matter where it comes from, or goes to.

Usually, DIP isn’t violated in such an over-the-top way as that. It’s often much more subtle:

namespace Numa.Infrastructure.Client
{
    public interface IUIHost
    { 

      //... 

      /// <summary>
      /// Get or Set Environment Configuration Setting
      /// </summary>
      Dictionary<string, string> EnvConfig
      {
        get;
        set;
      }
    }
}

Note the implementation detail: IUIHost.EnvConfig returns a Dictionary<string,string>. Not only does the interface depend on that detail, it forces all of its consumers dependency on it simply by making it part of the interface. A reasonable SOLID refactoring of this interface would probably replace the Dictionary<string , string> class with an IDictionary<string , string> interface.

This simple change does two things to improve the quality of our design:

  • It protects the clients of our abstractions from changes in our implementation of the abstraction. A derived design might deal with huge result sets, and so it loads the data lazily. A specific Dictionary type can’t do that, but something implemented under the IDictionary interface can.
  • It documents a contract between our abstraction and clients of our abstraction – constraining both our designs, and the valid uses of our designs. The change says our interface returns an object that can map a string to another string.

Before we think we’re done, consider this: both the original and the changed code also says the IUIHost.EnvConfig property supplies a type where:

  • String mappings can be added
  • String mappings can be removed
  • String mappings can be cleared

In other words, this interface promises to support those additional features. If we don’t want to make all of those promises, we should choose an abstraction that better describes what we mean (i.e., If we don’t want to allow modifications to the result set, we might consider an IEnumerable or ICollection return type)

Another place we see the DIP violated is in code written from a procedural perspective – code that depends on specific function calls instead of calling through abstractions or interfaces. They are often implemented in types with names like “FooManager” or “FooHelper”. For example:

using System;
using System.Security.Principal; 

public static class AuthorizationService
{ 

  public static bool HasPermission(
      string action,
      IIdentity identity)
     {
        // something
     }
} 

public class MyClass
{ 

   public void DoSomething()
   { 

      bool permitted = AuthorizationService.HasPermission(
         "MyClass.DoSomething",
         System.Threading.Thread.CurrentPrincipal.Identity); 

      if (permitted)
      { 

         // blah 

      } 

   } 

}

MyClass.DoSomething depends on the following implementation details:

  1. where the Identity comes from (the Thread’s CurrentPrincipal)
  2. it depends on where it gets the AuthorizationService from (it’s a global)
  3. the specific implementation of the AuthorizationService.

To understand this code’s resistance to change, try writing a unit test around MyClass.DoSomething() that only invokes a test stub AuthorizationService. It simply can’t be done.

AuthorizationService exposes the detail that there’s only one (implied by the fact that it’s a static class) callers will also depend on that detail, and that its services will always be accessible via a reference to the class name. In other words, it simply can’t be changed without forcing a rebuild of all clients.

An improved design might look something like:

using System;
using System.Security.Principal; 

public interface IAuthorizationService
{ 

   bool HasPermission(string action, IIdentity identity); 

} 

public class MyClass
{ 

   IIdentity _user;
   IAuthorizationService _authSvc; 

   public MyClass(IAuthorizationService authService,
      IIdentity user)
   { 

      _authSvc = authService;
      _user = user; 

   } 

   public void DoSomething()
   { 

      bool permitted = _authSvc.HasPermission(
         "MyClass.DoSomething", _user); 

      if (permitted)
      {
         // blah
      } 

   } 

}

This technique, called Constructor Dependency Injection, breaks MyClass’s dependency on the detail that an AuthorizationService is implemented a particular way, and that it’s supplied in a particular way. By breaking that dependency, our designs become much easier to change and test.

Is Your Code SOLID: The Interface Segregation Principle

Simplicity is the ultimate sophistication. ~ Leonardo da Vinci

Which one was the volume control?

Robert Martin, in his 1996 article says:

When clients are forced to depend upon interfaces that they don’t use, then those clients are subject to changes to those interfaces. This results in an inadvertent coupling between all the clients. Said another way, when a client depends upon a class that contains interfaces that the client does not use, but that other clients do use, then that client will be affected by the changes that those other clients force upon the class. We would like to avoid such couplings where possible, and so we want to separate the interfaces where possible.

An example of this problem can be seen in the following interface:

    /// An interface that converts IIdentities into an IMyIdentity (i.e., an identity with Roles and properties)
    public interface IAuthorizationProvider
    {
        IMyIdentity GetRolesFor(IIdentity identity);
        MembershipInfo GetRoleMembershipInfo(string role);
        IEnumerable<string> Roles { get; }

        void AddRole(RoleInfo roleInfo);
        void AddSubjectToRole(string role, IIdentity identity);
    }

No one would argue that an interface with 5 methods isn’t manageable. But as we look at the code from a client perspective, it’s flabby. Reading through the body of code this sample was lifted from, there are two consumers of this interface:

  1. Users who are simply interested in determining whether an Identity is authorized to do something.
  2. Someone who needs to manage authorizations/role memberships, and what roles are associated with an identity.

The most common user is only interested in reading whether an identity is in a role. For that user, the rest of the interface is completely unnecessary, and unnecessarily coupled (to say nothing of the security risk having authorization management api’s available to the average user).

The ISP driven improvement for this design might look something like:

    /// An interface that converts IIdentities into an IMyIdentity (i.e., an identity with Roles and properties)
    public interface IAuthorizationProvider
    {
        IMyIdentity GetAuthorizations(IIdentity identity);
    }

    /// An interface that converts IIdentities into an IMyIdentity (i.e., an identity with Roles and properties)
    public interface IRoleManager
    {
        MembershipInfo GetRoleMembershipInfo(string role);
        IEnumerable<string> Roles { get; }

        void AddRole(RoleInfo roleInfo);
        void AddSubjectToRole(string role, IIdentity identity);
    }

If you’re wondering why I didn’t derive IRoleManager from IAuthorizationProvider, consider this: if we’re going to all this effort to reduce coupling, then inheritance is the last thing we want. Inheritance is the strongest form of coupling you can have between two types (outside of C++’s concept of “friend”). Interface segregation is about making the surface area of our dependencies smaller. Because inheritance exposes derived classes to protected data in bases, by definition, we increase that dependency surface.

Keep your interfaces demure. You’ll sleep better. I promise.

Powered by WordPress with GimpStyle Theme design by Horacio Bella.
Entries and comments feeds. Valid XHTML and CSS.