Friday, April 21, 2023

Trying to write code with Bard and ChatGPT.

I have a standard test that I keep giving LLMs to see if I can introduce them into my workflow. So far, they haven't been useful.

My standard problem is asking the LLM to "write a function in java to decode a rfc 3588 diameter header"

It's an easy standard, and an easy piece of code. There are some fiddly bits brought in because it's Java.

Here is my latest transcript, I’ve pruned the getter’s and the field definitions where I didn’t think they were important. This is not the level of coding for a Google job interview. This quality of code wouldn't even get you past a screening interview. Anyone who says these models can program is selling you something.

Jason: write a function in java to decode a rfc 3588 diameter header


Bard: Sure, here is a function in Java to decode a RFC 3588 Diameter header:


public class DiameterHeader {

    public DiameterHeader(byte[] data) {

        this.code = data[0];

        this.identifier = data[1];

        this.length = data[2] << 8 | data[3];

        this.data = Arrays.copyOfRange(data, 4, 4 + length);

    }

}

Jason: The code is buggy, please fix it.


Bard: Sure, here is the fixed code:


public class DiameterHeader {

    public DiameterHeader(byte[] data) {

        if (data.length < 4) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.code = data[0];

        this.identifier = data[1];

        this.length = data[2] << 8 | data[3];

        this.data = Arrays.copyOfRange(data, 4, 4 + length);

    }

}



Jason: The diameter header starts with an 8 bit version



Bard: You are correct. The Diameter header starts with an 8-bit version field. The version field is used to indicate the version of the Diameter protocol that is being used. The fixed code should be updated to include the version field.


Here is the updated code:


public class DiameterHeader {


    public DiameterHeader(byte[] data) {

        if (data.length < 5) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.code = data[1];

        this.identifier = data[2];

        this.length = data[3] << 8 | data[4];

        this.data = Arrays.copyOfRange(data, 5, 5 + length);

    }

}


Jason: then there is:

2) a 24 bit message length

3) a 8 bit flag field

4) a 24 bit command code

5) a 32 bit application id

6) a 32 bit hop-by-hop id

7) a 32 bit end to end id



Bard: Sure, here is the updated code that includes the additional fields:


public class DiameterHeader {


    private int version;

    private int messageLength;

    private int flagField;

    private int commandCode;

    private int applicationId;

    private int hopByHopId;

    private int endToEndId;


    public DiameterHeader(byte[] data) {

        if (data.length < 13) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.messageLength = data[1] << 16 | data[2] << 8 | data[3];

        this.flagField = data[4];

        this.commandCode = data[5] << 16 | data[6] << 8 | data[7];

        this.applicationId = data[8] << 24 | data[9] << 16 | data[10] << 8 | data[11];

        this.hopByHopId = data[12];

        this.endToEndId = data[13];

    }

}


Sources

  1. github.com/raygomez/diacoder subject to license (MIT)


NOTE FROM JASON: I love that it quoted the source, however the source has the standard “24 bit and 32 bit unsigned values don’t fit in integers” bug that these LLMs keep producing.



Jason: hop by hop is 32 bits.


Bard: You are correct. The hop-by-hop ID is 32 bits in size. The code has been updated to reflect this.

Here is the updated code:


public class DiameterHeader {

    public DiameterHeader(byte[] data) {

        if (data.length < 14) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.messageLength = data[1] << 16 | data[2] << 8 | data[3];

        this.flagField = data[4];

        this.commandCode = data[5] << 16 | data[6] << 8 | data[7];

        this.applicationId = data[8] << 24 | data[9] << 16 | data[10] << 8 | data[11];

        this.hopByHopId = data[12] << 24 | data[13] << 16 | data[14] << 8 | data[15];

        this.endToEndId = data[16];

    }

}


Sources

  1. github.com/raygomez/diacoder subject to license (MIT)



Jason: end to end is 32 bits


Bard: You are correct. The end-to-end ID is 32 bits in size. The code has been updated to reflect this.


Here is the updated code:


public class DiameterHeader {


    public DiameterHeader(byte[] data) {

        if (data.length < 17) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.messageLength = data[1] << 16 | data[2] << 8 | data[3];

        this.flagField = data[4];

        this.commandCode = data[5] << 16 | data[6] << 8 | data[7];

        this.applicationId = data[8] << 24 | data[9] << 16 | data[10] << 8 | data[11];

        this.hopByHopId = data[12] << 24 | data[13] << 16 | data[14] << 8 | data[15];

        this.endToEndId = data[16] << 24 | data[17] << 16 | data[18] << 8 | data[19];

    }

}


Sources

  1. github.com/raygomez/diacoder subject to license (MIT)



Jason: 32 bit fields don't fit in java int, please fix it.


Bard: You are correct. 32-bit fields do not fit in Java int, which is a 32-bit signed integer. The code has been updated to use long, which is a 64-bit signed integer.


Here is the updated code:


public class DiameterHeader {


    private long version;

    private long messageLength;

    private long flagField;

    private long commandCode;

    private long applicationId;

    private long hopByHopId;

    private long endToEndId;


    public DiameterHeader(byte[] data) {

        if (data.length < 20) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.messageLength = data[1] << 24 | data[2] << 16 | data[3] << 8 | data[4];

        this.flagField = data[5];

        this.commandCode = data[6] << 24 | data[7] << 16 | data[8] << 8 | data[9];

        this.applicationId = data[10] << 24 | data[11] << 16 | data[12] << 8 | data[13];

        this.hopByHopId = data[14] << 24 | data[15] << 16 | data[16] << 8 | data[17];

        this.endToEndId = data[18] << 24 | data[19] << 16 | data[20] << 8 | data[21];

    }

}


Sources

1. github.com/mustafabayar/EasyDiameter subject to license (MIT)




Jason: 8 bit fields should be short


Bard: You are correct. 8-bit fields should be short, which is a 16-bit signed integer. The code has been updated to use short, and the data array is now 13 bytes in size.


Here is the updated code:


public class DiameterHeader {


    private short version;

    private short messageLength;

    private short flagField;

    private short commandCode;

    private short applicationId;

    private short hopByHopId;

    private short endToEndId;


    public DiameterHeader(byte[] data) {

        if (data.length < 13) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.messageLength = (short)(data[1] << 8 | data[2]);

        this.flagField = data[3];

        this.commandCode = (short)(data[4] << 8 | data[5]);

        this.applicationId = (short)(data[6] << 8 | data[7]);

        this.hopByHopId = data[8];

        this.endToEndId = data[9];

    }

}


Jason: undo that last edit.


Bard: Sure, here is the code without the last edit:


public class DiameterHeader {


    private int version;

    private int messageLength;

    private int flagField;

    private int commandCode;

    private int applicationId;

    private int hopByHopId;

    private int endToEndId;


    public DiameterHeader(byte[] data) {

        if (data.length < 20) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.messageLength = data[1] << 24 | data[2] << 16 | data[3] << 8 | data[4];

        this.flagField = data[5];

        this.commandCode = data[6] << 24 | data[7] << 16 | data[8] << 8 | data[9];

        this.applicationId = data[10] << 24 | data[11] << 16 | data[12] << 8 | data[13];

        this.hopByHopId = data[14] << 24 | data[15] << 16 | data[16] << 8 | data[17];

        this.endToEndId = data[18] << 24 | data[19] << 16 | data[20] << 8 | data[21];

    }

}


I apologize for the previous edit. I am still under development and learning to be informative and comprehensive as possible.


Sources

1. github.com/raygomez/diacoder subject to license (MIT)


Jason: change all the fields to longs


Bard: Sure, here is the code with all the fields changed to longs:


public class DiameterHeader {


    private long version;

    private long messageLength;

    private long flagField;

    private long commandCode;

    private long applicationId;

    private long hopByHopId;

    private long endToEndId;


    public DiameterHeader(byte[] data) {

        if (data.length != 20) {

            throw new IllegalArgumentException("Invalid Diameter header data");

        }


        this.version = data[0];

        this.messageLength = data[1] << 24 | data[2] << 16 | data[3] << 8 | data[4];

        this.flagField = data[5];

        this.commandCode = data[6] << 24 | data[7] << 16 | data[8] << 8 | data[9];

        this.applicationId = data[10] << 24 | data[11] << 16 | data[12] << 8 | data[13];

        this.hopByHopId = data[14] << 24 | data[15] << 16 | data[16] << 8 | data[17];

        this.endToEndId = data[18] << 24 | data[19] << 16 | data[20] << 8 | data[21];

    }

}

Sources

1. github.com/mustafabayar/EasyDiameter subject to license (MIT)


Review


Both ChatGPT and Bard are obsequious. They need to quit telling me I'm correct. If they tell me I've got the field "correct", they should then know what the rest of the fields are. Otherwise, they should say something like "understood".


I love that it is getting pretty good at editing code. This hints at where AI models might be helpful - applying edit requests - you send a document out for review, someone provides a comment, the model recommends an edit and you either take the edit, or provide your own. Sometimes it saves you time, but even if it gets it wrong, it's not wasting your time.


Bard didn’t stick with initial fixes, so it broke MessageLength and CommandCode again after I told it they were 24 bit fields. It also kept getting the buffer overflow check wrong. The standard “24 bit unsigned values don’t fit in a 32 bit java int.” bug also showed up. However, that bug is in the referenced source, so yup that’s why you pay me.


That I had to keep going _back_ and refixing the bugs is ugly. That I missed them until final review is even worse.


Bard providing a reference for the code, complete with LICENSE was really impressive. 


I tried again with ChatGPT. It produced just as buggy code and I gave up pretty early. I followed the exact same script as Bard, this is what it produced after the “2-7” field description prompt:


public static void decodeDiameterHeader(byte[] headerBytes) throws Exception {

    // Check that the header is at least 20 bytes long

    if (headerBytes.length < 20) {

        throw new Exception("Invalid header length");

    }


    // Extract the header fields

    int version = (headerBytes[0] & 0xff) >> 5;

    int messageLength = ((headerBytes[0] & 0x3) << 16) |

                        ((headerBytes[1] & 0xff) << 8) |

                        (headerBytes[2] & 0xff);

    int flags = headerBytes[3] & 0xff;

    int commandCode = ((headerBytes[4] & 0xff) << 16) |

                      ((headerBytes[5] & 0xff) << 8) |

                      (headerBytes[6] & 0xff);

    long applicationId = ((long)(headerBytes[7] & 0xff) << 24) |

                         ((long)(headerBytes[8] & 0xff) << 16) |

                         ((long)(headerBytes[9] & 0xff) << 8) |

                         (headerBytes[10] & 0xff);

    long hopByHopId = ((long)(headerBytes[11] & 0xff) << 24) |

                      ((long)(headerBytes[12] & 0xff) << 16) |

                      ((long)(headerBytes[13] & 0xff) << 8) |

                      (headerBytes[14] & 0xff);

    long endToEndId = ((long)(headerBytes[15] & 0xff) << 24) |

                      ((long)(headerBytes[16] & 0xff) << 16) |

                      ((long)(headerBytes[17] & 0xff) << 8) |

                      (headerBytes[18] & 0xff);

}


For some reason, it kept wanting to right shift version. I told it that it shouldn’t do it, and it removed it, but it came back immediately. The oscillating bug is what caused me to give up the will to live and abandon the test. On the plus side, it new that 32 bit integers didn't fit in the int type. However, it didn't know that 24 bit ints didn't fit either.


Total effort with Bard was on the order of 15 minutes - so very unproductive.


For anyone reading along, the spec is public and very easy for a human programmer to understand:


Diameter Header - RFC 3588, Page 32.



    0                   1                   2                   3

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |    Version    |                 Message Length                |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   | command flags |                  Command-Code                 |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                         Application-ID                        |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                      Hop-by-Hop Identifier                    |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                      End-to-End Identifier                    |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Addendum


I sent a fix back upstream to the referenced code. In writing it, I learned that Java is insanely odd and likely broken in all sorts of ways. It doesn't do what you think it does...

byte womble = -1; // 0xff doesn't fit, remember?

womble & 0xFF << 16; // 16711680 (what we want!)

is not the same as

womble << 16; // -65536

Seems the 0xFF converts it to an int bitwise and then does the shift, whereas the other just takes the -1 and makes it a 16 bit -1.

Yeah, as a C programmer, that's pretty messed up.

So, extracting the value shouldn't look like:

(input[1] << 16) + (input[2] << 8 )+ input[3]

which looks ok when you're reading code.

It shouldn't be this either:

(input[1] << 16) | (input[2] << 8 ) | input[3]

It needs to be:

(input[1] & 0xFF) << 16 | (input[2] & 0xFF) << 8 | (input[3] & 0xFF)

So, that puts an entirely different spin on the results. THEY'RE ALL BROKEN IN STRANGE AND WONDERFUL WAYS THAT YOU WILL NEVER FIND.

I only found it because I went, "hmm, that will _take_ a 24 bit number, I wonder what will happen if I _use_ a 24 bit number". Oooh, explosions!