Sunday, February 28, 2010

Distance measure between software artifacts

There's currently some interest in software visualization, which tries to speed up your understanding of source code by leveraging your brain's great ability to discern patterns in images. Below is shot of Codemap, visualizing something.

To draw a map such as the above, you need an underlying metric that tells you how far apart the code artifacts should be. (You probably won't be able to draw a map where the distances are exactly what you want them to be, but you may get close.) Now, what is used as a metric between software artifacts (islands, in the above image)? Codemap uses the vocabulary of the artifact, that is: artifacts that use many same words will end up closer together in the overall drawing.

A botfly moves along a herd of cattle. It finds every of the beasts hospitable, but it can't live without a herd. Thus, Richard Dawkins calls the herd the botfly's "Archipelago" (in The greatest show on earth, p. 253). This inspires me to a metric which is supposed to measure similarity of software artifacts. The idea is to assume that if two artifacts are hospitable to the same code snippet, then they should provide similar environments. Thus, you could measure how in the evolution of a software program, snippets wandered from one place to another, and assume that whenever a code snippet is moved from one artifact to another, these two artifacts increase in our metric's similarity.

It seems difficult to bring forth evidence that software maps aid understanding in the first place, thus I wouldn't know how to show that a metric like the one outlined above is any better than a comparison of metrics. Yet, there's some momentum in the field, so I'll save up the idea for a point in time when more light will be shed on the benefits of software visualization.


  1. Thank for picking up the distance problem of software cartography!

    The idea of hospitality is very original, and I like that you define in terms of code editing. It is however not an optimal distance measurement. First of all, it requires historical data which might not be available for all projects are hard to retrieve at best. That aside, it will result in a distance metric that is undefined for most pairs of software elements. A good distance metric should be defined and non-constant for almost all pairs. If we get too many undefined or constant distances there is not enough global distance information to achieve a good layout.

    On an other hand, we would need to know more about the rational for moving code from one place to another. Maybe it was move because the code's origin was /not/ hospital to that piece of code and thus your distance would quantify the opposite of what you are looking for. I am just trying to find other ways to define hospitality … for example through technical reachability analysis of code snippets? So you would pick a dozen of snippets as kernels and then for each class or file compute the hospitality. But then, how? do you define it terms of being close to the accessed data (which is technically straight forward to measure) or in terms if being at the right responsibility (which is hard to quantify but of which we know that it is a better design principle than data abstraction).

  2. Ever since measurement is useful for us.People develop easiest way of measurement and measuring tools.


Note: Only a member of this blog may post a comment.