Sunday, February 28, 2010

Distance measure between software artifacts

There's currently some interest in software visualization, which tries to speed up your understanding of source code by leveraging your brain's great ability to discern patterns in images. Below is shot of Codemap, visualizing something.





To draw a map such as the above, you need an underlying metric that tells you how far apart the code artifacts should be. (You probably won't be able to draw a map where the distances are exactly what you want them to be, but you may get close.) Now, what is used as a metric between software artifacts (islands, in the above image)? Codemap uses the vocabulary of the artifact, that is: artifacts that use many same words will end up closer together in the overall drawing.

A botfly moves along a herd of cattle. It finds every of the beasts hospitable, but it can't live without a herd. Thus, Richard Dawkins calls the herd the botfly's "Archipelago" (in The greatest show on earth, p. 253). This inspires me to a metric which is supposed to measure similarity of software artifacts. The idea is to assume that if two artifacts are hospitable to the same code snippet, then they should provide similar environments. Thus, you could measure how in the evolution of a software program, snippets wandered from one place to another, and assume that whenever a code snippet is moved from one artifact to another, these two artifacts increase in our metric's similarity.

It seems difficult to bring forth evidence that software maps aid understanding in the first place, thus I wouldn't know how to show that a metric like the one outlined above is any better than a comparison of metrics. Yet, there's some momentum in the field, so I'll save up the idea for a point in time when more light will be shed on the benefits of software visualization.