Sunday, September 12, 2004

Duff's Device, Coroutines and Continuations

I've been doing a lot of programming recently after a bit of a hiatus. It's not that I haven't been programming, it's more the type. Instead of writing deep-down, think about it for 10 minutes and then press a key, kind of software I'm writing more bread and butter code.

The change in style has made me aware of several frustrations. When writing bread and butter code, I find that a vast majority of it is boilerplate. It doesn't change from class to class, and I even find myself performing cut-and-paste programming. I'm finding this frustrating because I feel that my productivity has dropped. I find that I have a certain number of lines of bug-free code per day in me. If I can do more by writing fewer lines, then I'm more productive. Having to write boilerplate saps my precious daily lines of code!

So, I'm on the lookout for new ideas to help me write more code faster. I look for ways to architect the code to avoid having to actually write code. This is the other side of Paul Graham's "Good Hackers Don't Use Java" argument. While I have to use C++, I'm looking for line saving ideas from other programming languages.

One of the more code sucking types of code is the state machine. State machines are everywhere. They are frequently implemented as tables of functions, with each state having a lot of boilerplate at the start and the end.

My explorations for a better state machine implementation have taken me to Duff's Device, Coroutines and Continuations. When I started, I knew what Duff's Device and Coroutines were, but I only had a vague concept (and an incorrect one at that) of what a Continuation was.

First, an introduction

Everyone out there has probably heard of loop unrolling. This is when you take a long running loop in your program and attempt to perform more transactions per iteration. The starting point looks something like this:

int j=100;
for (int i=0;i<j;i++) {
    printf("%d\n", i);
}

To speed up this type of loop, you attempt to perform more printf's per iteration. Simplistically, you will try something like this:

for (int i=0;i<j/5;i+=5) {
    printf("%d\n", i);
    printf("%d\n", i+1);
    printf("%d\n", i+2);
    printf("%d\n", i+3);
    printf("%d\n", i+4);
}

However, what happens if j = 3? The code will (now incorrectly) still output 0-4. There is a way to fix it...

Enter Duff's Device.

Essentially, Duff's Device makes use of some interesting behaviour of switch and case statements. The switch/case statements act like named goto's instead of if statements. This gives them the ability to jump into scopes!

So, let's change the code to use Duff's Device:

int i=0;
int j=3;

switch (j%5) {
case 0:
for (;i<j;i+=5) {
    printf("%d\n", i);
case 4:
    printf("%d\n", i+1);
case 3:
    printf("%d\n", i+2);
case 2:
    printf("%d\n", i+3);
case 1:
    printf("%d\n", i+4);
}

This is how loops are unrolled. In this way, we get many more operations for each iteration through the loop.

At this point, the audience yells out "But this doesn't save me any lines of code!", and they would be correct. It's Duff's Device with the next concept that can save us code. Enter Coroutines.

Coroutines

Coroutines are co-operative multitasking. At some point the task will indicate that it yields the processor, allowing another thread of control to be executed. If the task doesn't give up control, it is never interrupted. Most systems have moved on to using full blown independent threads, but there are still some classes of systems where threads aren't appropriate. Good examples are systems with several hundred thousand independent operations active at any time. It isn't possible to have a real thread for each operation, and managing a thread pool is just as hard as managing coroutines. Add to that the benefits of avoiding locking and resource sharing issues and the argument for coroutines becomes pretty compelling. On those systems, code can still be simpflified by using coroutines. State machine code can be dramatically simplified, turning it into a linear top to bottom function rather than a set of states managed by an array or switch.

For example, consider the following (note, the code only works with GCC):

class coroutine {
   public:
   coroutine() : state(0) {}
   virtual void run();
   private:
   int state;
}

#define coBEGIN() switch (state) {case 0:
#define coRETURN() state = __LINE__;return;case __LINE__:
#define coEND()   }

class coBasic : public coroutine {
    public:
    virtual void run();
}

void coBasic::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    coRETURN();

    fprintf(stderr, "Here we are in state 2\n");
    coRETURN();

    fprintf(stderr, "Here we are in state 3\n");

    coEND();
}

int main(void) {
   coBasic co;

   co.run();
   co.run();
   co.run();
}

Where does this save us code? We save code in several locations. First, we don't have to write a function header for each state. Code will flow from one state to the next. Next up, we avoid all of the setup code. Frequently code will have to do at least a little bit of marshalling as it prepares to run. Using this, that code is shared among all of the states, instead of being copied, or shared as a separate function.

There are some limitations to this implementation. First, it is very difficult to specify the next state to invoke, linear is the name of the game. It is possible to change the macros to use specific state names instead of a simple return, which would allow more complex state machines to be created. If we restrict ourselves to GCC, whose preprocessor allows macro overloading, we can even combine the two with both named and unnamed states.

Also, there are some limitations imposed by the use of Duff's Device. C++ does not allow code to jump over constructors, so code like the following will resultin a compilation error:

void coBasic::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    std::map<int, int> foobar; // <-- ERROR
    coRETURN();

    fprintf(stderr, "Here we are in state 2\n");

    coEND();
}

To get around this, we can do two things. We can either move to C-style declarations and do all of our declarations at the start of the function, or we can modify the macros slightly. The modified macros would look like this:

#define coBEGIN() switch (state) {case 0:{
#define coRETURN() state = __LINE__;return;}case __LINE__:{
#define coEND()   }}

As you can see, we're adding an additional scope around each state. In that way, we avoid jumping over a constructor for an object we could conceivably use, but we do end up having some strange compiler problems. Consider the following code:

void coBasic::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    std::map<int, int> foobar;
    coRETURN();

    fprintf(stderr, "Here we are in state 2 map size = %d\n",
            foobar.size()); // <-- ERROR

    coEND();
}

This will generate an undefined variable error for foobar, because foobar is only defined in the previous state. So, we get slightly more confusing code because scopes aren't necessarily clearly defined. However, this does allow state-local variables to be declared and used.

Continuations

When I originally started to write this article, I misunderstood what a continuation was. I thought it was simply a coroutine by another name. I was wrong. Continuations are about continuing execution at stored location in the procedure, but they add a new wrinkle. Where the coroutine above returns, a continuation passes the execution location down the stack. Perhaps a picture will help:

Coroutine:

 A -> B (no state)
   <- B (point 1)
 A -> B (starts at point 1)
   <- B returns
...

Continuation
 A -> B (no state)
        -> C (B at point 1)
             -> D (B still at point 1)
                 -> B (starts at point 1)
   <--------------- B returns all the way back to A
A

It seems that continuations are a relatively new programming style, with few people understanding their benefits, or even being able to explain how they work. The easiest explanation I found was on c2.com (http://c2.com/cgi/wiki?ContinuationExplanation), especially the C code version (http://c2.com/cgi/wiki?ContinuationsInCee). The form of continuation presented in that code is called a "Single Use Continuation", meaning it can only be executed if it is still on the stack, and that it can only be invoked once. Some languages (Lisp/Scheme/Dylan?) fully support continuations allowing them to be executed multiple times at any point.

There is one more thing about continuations. They save the state of the stack at the point the continuation is created, allowing code to return to that location using the existing local variables. This is very useful for unraveling network and user interaction code. For this we really need a language that supports continuations, such as Scheme, Lisp and Dylan. In C/C++, we can mostly fake it with some limitations.

We can fake them using various methods. In C, we can use setjmp/longjmp or on x86, modify the stack base pointer as in the C2 example. In C++, we can do much of the same using exceptions. The use of exceptions allows us to properly clean up the parent stack frames, while returning to the previous stack frame. In C, we can use the speedier code and simply "Go There".

Using the previous example, the C/C++ continuations I'm presenting here would also invalidate any continuations in stack frames C or D when B is executed the second time.

For a basic implementation, consider the following:

class continuation<class X> {
   public:
   typedef X myContinueSignal;
   continuation() : state(0) {}
   virtual void run();
   void continue();
   private:
   int state;
}

void continuation::continue() {
    throw X();
}

struct myContinuationContinueSignal {}

class myContinuation: public continuation<class myContinuationContinueSignal> {
    public:
    void run();
}

#define coBEGIN() switch (state) {case 0:{
#define coRETURN() state = __LINE__;return;}case __LINE__:{
#define coCALL(x)  state = __LINE__;try {x;}catch (myContinuation::myContinueSignal&){}case __LINE:{
#define coEND()   }}

I hope I've got the code correct, I've never actually plugged any of this into a compiler, so you can be reasonably sure it won't work.

This allows us to take the original state machine and add some niftyness.

void get_some_more_information(continuation *continuation) {
    // Do some stuff, like send a message and then get a response
    // Do some more stuff, like put the response into the class.
    continuation->continue();
}

void myContinuation::run() {
    coBEGIN();

    fprintf(stderr, "Here we are in state 1\n");
    coRETURN();

    fprintf(stderr, "Here we are in state 2\n");
    coRETURN();

    coCALL(get_some_more_information(this));

    fprintf(stderr, "Here we are in state 3\n");

    coEND();
}

Here, if get_some_more_information decides it already has the requested information, it can continue immediately. If decides that it will have to block, it can use it's own continuation to throw back through myContinuation::run while still preserving the state for later execution.

Why is this useful? We could pass the coroutine down and execute it again later. If we did that, when get_some_more_information eventually returned the code would again execute. We could try to execute the continuation at a possibly new state, but that may not be what we want. It results in hard to manage execution paths, with the code possibly getting extra, undesired state transitions.

These continuations are still limited. Due to the try/catch blocks, we can't really interleave continuations of the same type. The destruction of the higher stack frames makes it difficult to have multiple continuations active at any one point in time. Finally, we can't return to the same continuation point more than once, meaning we can't store a state and then return to it at a future point, to, for example, re-attempt a transaction.

Coroutines and Continuations. Both are great at decreasing the amount of bug-ridden, hard to read, unmaintainable C++ state machine code you have to write. I know it's helped me get more done in a day.

3 comments:

Anonymous said...

Hmmm.

Anonymous said...

Continuations do not save the stack.

This is a fundamental misunderstanding.

They require that the data that the continuation requires remain extant.

This often requires that a stack not be used for local storage.

Jason Pollock said...

Ah, so it looks like they're saving the stack, but actually they're not using it at all. Nice.