Niko Schwarz's science and programming: March 2010

Thursday, March 25, 2010

Git tree objects, how are they stored?

A git tree is git’s internal representation for directories. Trees have entries which are either trees themselves, or blobs.

The general format is: First 4 bytes declaring the object type. In our case, those four bytes are “tree”, ASCII-encoded. Then comes a space, and then the entries, separated by nothing.
The exact format is the following. All capital letters are “non-terminals” that I’ll explain shortly.

    tree ZN(A FNS)*

where

N is the NUL character

Z is the size of the object in bytes

A is the unix access code, ASCII encoded, for example> 100644 for a vanilla file.

F is the filename, (I’m not sure about the encoding. It’s definitely ASCII-compatible), NUL-terminated.

S is the 20 byte SHA hash of the entry pointed to, 20 bytes long.

Here’s an example. Say we have a directory with two files, called test and test2. The SHA of the directory is f0e12ff4a9a6ba281d57c7467df585b1249f0fa5. You can see the SHA-hashes of the entries in the output of

    $ git cat-file -p f0e12ff4a9a6ba281d57c7467df585b1249f0fa5
    100644 blob 9033296159b99df844df0d5740fc8ea1d2572a84    test
    100644 blob a7f8d9e5dcf3a68fdd2bfb727cde12029875260b    test2

(To reproduce this, the file test should contain the ASCII-encoded characters “hallo” (not \n-terminated), the file test2 should contain the ASCII-encoded characters “bla\n” (yes, \n-terminated)).

That is converted into the following object (uncompressed):

Finally, the file is deflated and stored on disk. For deflation, zlib is used. For my experiments I used the ruby zlip package. The following tiny script reads the tree from disk and inflates it, so you can look at it:


    require 'zlib'
    require 'fileutils'
    File.open("objects/f0/e12ff4a9a6ba281d57c7467df585b1249f0fa5") { |f|
        puts Zlib::Inflate.inflate(f.read)
    }

Monday, March 22, 2010

Why psychology and computer science should go together

Saturday, March 20, 2010

You can't cast Object[] to String[] in Java

Consider this snippet:
String[] str = (String[]) new Object[] { "Bla!"};.
That gives a ClassCastException. The short version of an explanation: an array instantiated by new Object[] is of a different class than one by new String[]. For the full glory of that fact, please see the Java Language Specification. As a teaser, I'll quote this snippet and its output from it:

class Test {
 public static void main(String[] args) {
  int[] ia = new int[3];
  System.out.println(ia.getClass());
  System.out.println(ia.getClass().getSuperclass());
 }
}

which prints:
class [I
class java.lang.Object.

Thursday, March 18, 2010

Thick tech books

James Hague:

"In order for a book to sell," said my publisher, "it's got to be thick. 600 pages thick."

Well, that about explains why tech books suck so badly.

How to test post-conditions

A pretty good student, Joshua Muheim, just asked me what the simplest way was to check postconditions. And my answer was the following:

Joshua, this is a research question! To the best of my knowledge, currently and in practice, developers use steamroller tactics to check postconditions. The standard way to verify that the LHC works as intended, in a method exterminate on a class EarthPopulation:

exterminate
    | oldSize | 
    oldSize := self size.
    LHC switchOn.
    self size should < oldSize.

That is, you copy whatever you intend to change and then compare new with old value. It isn’t difficult to imagine that this can lead to trouble if whatever you want to change isn’t cloneable. And of course it isn't possible to switch off the overhead of these assertions.* And that’s where Histoory (by Pluquet et al) comes to the rescue. Not only does it work independent of the cloneability, it also provides a much cleaner syntax:

exterminate
    [LHC swithOn]
        postCond: [:old | old size should < self size]

Alas, it isn’t in the latest flavor of any programming language, so I truly hope that Histoory will make it into Pharo :).

* I added this sentence after Joshua's correct observation of this problem.

Tuesday, March 16, 2010

Book on algorithm design

Recently, Sedgewick's algorithms book slipped through my hands again. It's a great book. I'd just like to point to a newer algorithms book that is thinner, and funner to read, and if you just want to browse for an hour or two, this is surely worth your while. That's Algorithm design, by Jon Kleinberg, and Éva Tardos.

Monday, March 15, 2010

Software merging, commit messages

What is the intent of a change? How do you merge conflicting changes? The current state of the art is dire. There's a survey, A state-of-the-art survey on software merging , but it's summary may be that there's no silver bullet to automated merging without understanding the intent of a commit.

How do you understand the intent of a commit? The best technique I am aware of today is … having the author provide a clear description of his intent. There's a paper about that, quite worth your while: Semantic patches considered helpful.

I told you that the situation is dire, didn't I?