Tuesday, September 21, 2004

The Importance of doing what you say you do.

I've worked in several organisations, in many different industries and styles. From blue chip insurance companies to telecommunications startups to crown corporations. Each organisation had a different style, be it XP/Agile, ISO-9000, or Wild West.

However, all of the successful ones had one thing in common. They did what they said they did. For example, if they said they used a waterfall development model, they made sure that that really was how they developed software. If they were XP, they followed the practices (all of them) in Kent Beck's book.

The organisations that struggled were the ones that either didn't say what they did, or worse, said one thing but actually worked entirely differently.

Everyone knows these groups. They don't understand why they fail, and they don't understand why they succeed. When they amazingly do succeed it is usually because of a sheer force of will by the team.

At first I thought it was certification that differentiated the organisations. The ISO 9000 certified teams developed better software than the others. But that wasn't it either. I've worked for organisations that didn't have ISO 9000 certification and developed good software, on time, and on budget. I've also worked for ISO certified companies that completely failed to actually deliver working products. It helped that the ISO training I had said that ISO didn't guarantee quality. :)

Certifications aren't key, although they do help. I feel that communication is the key. With good communication, the team can quickly come to a concensus on the state of the project and what the next step is.

I see this all over the place. As teams get larger, communication gets harder. It isn't just that the number of possible conversations increases exponentially, it's that each person finds it harder to communicate.

A small team develops a method of working through peer pressure and shared experience. They do things a certain way because they have always done it that way, and it works. They don't have to talk about it because the communication between them is constantly happening. The small team is able to monitor each other to keep each other out of trouble. For example, if someone decides to rewrite a piece of code, they will tend to avoid causing problems for others since they know what everyone else is working on.

As teams get larger, this communication becomes harder - it no longer happens as a side effect. New members don't share the same experiences, so they make the old mistakes all over again. Even worse, people start spending more and more time cleaning up after each other's changes.

I believe that the key is the shared metaphor -- the development culture. If developers in a team are able to say, "We do that", and everyone understands what "that" is, the team is better off. New staff understand what is expected of them, and what to expect of their team mates. If people share the same language, that saves time allowing the communication of more important things like, "What's Bill doing today?"

This is where codification, standardisation and certification helps. It gives everyone a common understanding. It doesn't have to be CMM or RUP or ISO 9000, it just needs to be understood. If the process you are following is based on another one, it is important to state the divergences.

For example, I worked for a group that said they used XP. However, they didn't do continuous integration, pair programming, short cycles, the planning game, have an onsite customer, work a 40 hour work week, or do test driven development. They did refactor, have a coding standard, have an automated build system that did run unit tests (which were disabled because they failed too frequently), and they did have short cycles (albeit with fixed content). They then went from a 4 person team to 12 people over the course of 3 months. There was a lot of angst because new developers who were hired with the expectations of XP kept running into barriers as they tried to go against the flow of the team and do all of XP. Once the new hires learned how the team was really expected to work, the complaints faded into the background and were dealt with as expected problems instead of process failures.

I've always said that if you don't improve how you do things, you're going to get worse. If a team doesn't "do what it says it does", there's no real way to know where the holes are. Without a common understanding, people may think that something is covered off when it isn't. If you've played any doubles sports (tennis, badminton, volleyball) you'll understand this. This is the point where everyone watches the ball head towards the ground waiting for the other person to play it.

So, "Doing what you say you Do" is all about communication, honesty and self-improvement. This has nothing at all to do with which type of process you use, many are appropriate, including "Hack it until it works". What does matter is that everyone on the the team knows and understands what it is that the team does.

Sunday, September 12, 2004

Duff's Device, Coroutines and Continuations

I've been doing a lot of programming recently after a bit of a hiatus. It's not that I haven't been programming, it's more the type. Instead of writing deep-down, think about it for 10 minutes and then press a key, kind of software I'm writing more bread and butter code.

The change in style has made me aware of several frustrations. When writing bread and butter code, I find that a vast majority of it is boilerplate. It doesn't change from class to class, and I even find myself performing cut-and-paste programming. I'm finding this frustrating because I feel that my productivity has dropped. I find that I have a certain number of lines of bug-free code per day in me. If I can do more by writing fewer lines, then I'm more productive. Having to write boilerplate saps my precious daily lines of code!

So, I'm on the lookout for new ideas to help me write more code faster. I look for ways to architect the code to avoid having to actually write code. This is the other side of Paul Graham's "Good Hackers Don't Use Java" argument. While I have to use C++, I'm looking for line saving ideas from other programming languages.

One of the more code sucking types of code is the state machine. State machines are everywhere. They are frequently implemented as tables of functions, with each state having a lot of boilerplate at the start and the end.

My explorations for a better state machine implementation have taken me to Duff's Device, Coroutines and Continuations. When I started, I knew what Duff's Device and Coroutines were, but I only had a vague concept (and an incorrect one at that) of what a Continuation was.

First, an introduction

Everyone out there has probably heard of loop unrolling. This is when you take a long running loop in your program and attempt to perform more transactions per iteration. The starting point looks something like this:

int j=100;
for (int i=0;i<j;i++) {
    printf("%d\n", i);
}

To speed up this type of loop, you attempt to perform more printf's per iteration. Simplistically, you will try something like this:

for (int i=0;i<j/5;i+=5) {
    printf("%d\n", i);
    printf("%d\n", i+1);
    printf("%d\n", i+2);
    printf("%d\n", i+3);
    printf("%d\n", i+4);
}

However, what happens if j = 3? The code will (now incorrectly) still output 0-4. There is a way to fix it...

Enter Duff's Device.

Essentially, Duff's Device makes use of some interesting behaviour of switch and case statements. The switch/case statements act like named goto's instead of if statements. This gives them the ability to jump into scopes!

So, let's change the code to use Duff's Device:

int i=0;
int j=3;

switch (j%5) {
case 0:
for (;i<j;i+=5) {
    printf("%d\n", i);
case 4:
    printf("%d\n", i+1);
case 3:
    printf("%d\n", i+2);
case 2:
    printf("%d\n", i+3);
case 1:
    printf("%d\n", i+4);
}

This is how loops are unrolled. In this way, we get many more operations for each iteration through the loop.

At this point, the audience yells out "But this doesn't save me any lines of code!", and they would be correct. It's Duff's Device with the next concept that can save us code. Enter Coroutines.

Coroutines

Coroutines are co-operative multitasking. At some point the task will indicate that it yields the processor, allowing another thread of control to be executed. If the task doesn't give up control, it is never interrupted. Most systems have moved on to using full blown independent threads, but there are still some classes of systems where threads aren't appropriate. Good examples are systems with several hundred thousand independent operations active at any time. It isn't possible to have a real thread for each operation, and managing a thread pool is just as hard as managing coroutines. Add to that the benefits of avoiding locking and resource sharing issues and the argument for coroutines becomes pretty compelling. On those systems, code can still be simpflified by using coroutines. State machine code can be dramatically simplified, turning it into a linear top to bottom function rather than a set of states managed by an array or switch.

For example, consider the following (note, the code only works with GCC):

class coroutine {
   public:
   coroutine() : state(0) {}
   virtual void run();
   private:
   int state;
}

#define coBEGIN() switch (state) {case 0:
#define coRETURN() state = __LINE__;return;case __LINE__:
#define coEND()   }

class coBasic : public coroutine {
    public:
    virtual void run();
}

void coBasic::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    coRETURN();

    fprintf(stderr, "Here we are in state 2\n");
    coRETURN();

    fprintf(stderr, "Here we are in state 3\n");

    coEND();
}

int main(void) {
   coBasic co;

   co.run();
   co.run();
   co.run();
}

Where does this save us code? We save code in several locations. First, we don't have to write a function header for each state. Code will flow from one state to the next. Next up, we avoid all of the setup code. Frequently code will have to do at least a little bit of marshalling as it prepares to run. Using this, that code is shared among all of the states, instead of being copied, or shared as a separate function.

There are some limitations to this implementation. First, it is very difficult to specify the next state to invoke, linear is the name of the game. It is possible to change the macros to use specific state names instead of a simple return, which would allow more complex state machines to be created. If we restrict ourselves to GCC, whose preprocessor allows macro overloading, we can even combine the two with both named and unnamed states.

Also, there are some limitations imposed by the use of Duff's Device. C++ does not allow code to jump over constructors, so code like the following will resultin a compilation error:

void coBasic::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    std::map<int, int> foobar; // <-- ERROR
    coRETURN();

    fprintf(stderr, "Here we are in state 2\n");

    coEND();
}

To get around this, we can do two things. We can either move to C-style declarations and do all of our declarations at the start of the function, or we can modify the macros slightly. The modified macros would look like this:

#define coBEGIN() switch (state) {case 0:{
#define coRETURN() state = __LINE__;return;}case __LINE__:{
#define coEND()   }}

As you can see, we're adding an additional scope around each state. In that way, we avoid jumping over a constructor for an object we could conceivably use, but we do end up having some strange compiler problems. Consider the following code:

void coBasic::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    std::map<int, int> foobar;
    coRETURN();

    fprintf(stderr, "Here we are in state 2 map size = %d\n",
            foobar.size()); // <-- ERROR

    coEND();
}

This will generate an undefined variable error for foobar, because foobar is only defined in the previous state. So, we get slightly more confusing code because scopes aren't necessarily clearly defined. However, this does allow state-local variables to be declared and used.

Continuations

When I originally started to write this article, I misunderstood what a continuation was. I thought it was simply a coroutine by another name. I was wrong. Continuations are about continuing execution at stored location in the procedure, but they add a new wrinkle. Where the coroutine above returns, a continuation passes the execution location down the stack. Perhaps a picture will help:

Coroutine:

 A -> B (no state)
   <- B (point 1)
 A -> B (starts at point 1)
   <- B returns
...

Continuation
 A -> B (no state)
        -> C (B at point 1)
             -> D (B still at point 1)
                 -> B (starts at point 1)
   <--------------- B returns all the way back to A
A

It seems that continuations are a relatively new programming style, with few people understanding their benefits, or even being able to explain how they work. The easiest explanation I found was on c2.com (http://c2.com/cgi/wiki?ContinuationExplanation), especially the C code version (http://c2.com/cgi/wiki?ContinuationsInCee). The form of continuation presented in that code is called a "Single Use Continuation", meaning it can only be executed if it is still on the stack, and that it can only be invoked once. Some languages (Lisp/Scheme/Dylan?) fully support continuations allowing them to be executed multiple times at any point.

There is one more thing about continuations. They save the state of the stack at the point the continuation is created, allowing code to return to that location using the existing local variables. This is very useful for unraveling network and user interaction code. For this we really need a language that supports continuations, such as Scheme, Lisp and Dylan. In C/C++, we can mostly fake it with some limitations.

We can fake them using various methods. In C, we can use setjmp/longjmp or on x86, modify the stack base pointer as in the C2 example. In C++, we can do much of the same using exceptions. The use of exceptions allows us to properly clean up the parent stack frames, while returning to the previous stack frame. In C, we can use the speedier code and simply "Go There".

Using the previous example, the C/C++ continuations I'm presenting here would also invalidate any continuations in stack frames C or D when B is executed the second time.

For a basic implementation, consider the following:

class continuation<class X> {
   public:
   typedef X myContinueSignal;
   continuation() : state(0) {}
   virtual void run();
   void continue();
   private:
   int state;
}

void continuation::continue() {
    throw X();
}

struct myContinuationContinueSignal {}

class myContinuation: public continuation<class myContinuationContinueSignal> {
    public:
    void run();
}

#define coBEGIN() switch (state) {case 0:{
#define coRETURN() state = __LINE__;return;}case __LINE__:{
#define coCALL(x)  state = __LINE__;try {x;}catch (myContinuation::myContinueSignal&){}case __LINE:{
#define coEND()   }}

I hope I've got the code correct, I've never actually plugged any of this into a compiler, so you can be reasonably sure it won't work.

This allows us to take the original state machine and add some niftyness.

void get_some_more_information(continuation *continuation) {
    // Do some stuff, like send a message and then get a response
    // Do some more stuff, like put the response into the class.
    continuation->continue();
}

void myContinuation::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    coRETURN();

    fprintf(stderr, "Here we are in state 2\n");
    coRETURN();

    coCALL(get_some_more_information(this));

    fprintf(stderr, "Here we are in state 3\n");

    coEND();
}

Here, if get_some_more_information decides it already has the requested information, it can continue immediately. If decides that it will have to block, it can use it's own continuation to throw back through myContinuation::run while still preserving the state for later execution.

Why is this useful? We could pass the coroutine down and execute it again later. If we did that, when get_some_more_information eventually returned the code would again execute. We could try to execute the continuation at a possibly new state, but that may not be what we want. It results in hard to manage execution paths, with the code possibly getting extra, undesired state transitions.

These continuations are still limited. Due to the try/catch blocks, we can't really interleave continuations of the same type. The destruction of the higher stack frames makes it difficult to have multiple continuations active at any one point in time. Finally, we can't return to the same continuation point more than once, meaning we can't store a state and then return to it at a future point, to, for example, re-attempt a transaction.

Coroutines and Continuations. Both are great at decreasing the amount of bug-ridden, hard to read, unmaintainable C++ state machine code you have to write. I know it's helped me get more done in a day.