Niko Schwarz's science and programming: 2009

Thursday, December 17, 2009

ThothCompiler revamped

ThothCompiler has been revamped very near completely. Get it by executing:

Gofer it
    squeaksource: 'ASTBridge';
    addPackage: 'ThothCompiler';
    load

Then go to the Preference Browser (System->Preferences in the world menu) and choose ThothCompiler for the option "whichCompiler".

ThothCompiler allows you to use a modern AST in Pharo, rather than the 20 year old default one. The code is parsed with Lukas' really fast RBParser and then transformed to the 20 year old nodes, from there compiled. What's the advantage?

Well, you can transform the modern AST nodes before compilation! It has a great visitor API, that lets you do all kinds of transformations. You can use ThothCompiler as is just to feel fancy, or better: subclass it and overwrite the transform: method, in which you can transform the AST to your liking.

ThothCompiler is now used in Helvetica (albeit under the name of RBCompiler, because things can never have enough names!).

Changes since the last blog post:

Now uses the RBParser, which is much faster than SmaCC
No more dependency on the NewCompiler or AST package
Adds a preference to the system (in the system menu), where you can choose the compiler, even when you can't compile anymore :)

Tuesday, December 15, 2009

What to keep in mind when playing with compilers in Pharo

If you play with the compiler in Pharo, here's an urgent hint: don't lock yourself out of your image! If you break the compiler, you want to have something to fall back to. So here's what you do.

The compiler for the system is whatever Behavior compilerClass returns, so change that method to read out a preference:

Behavior>compilerClass
 "Answer a compiler class appropriate for source methods of this class."
 |default|
 default := Compiler.
 ^(Smalltalk classNamed:  (Preferences valueOfPreference: #whichCompiler ifAbsent: [^ default]) ) ifNil: [default]

That's step 1. Second step is to create the preference, so you can read it out at all:

Preferences addTextPreference: #whichCompiler category: #compiler default: 'Compiler' balloonHelp: 'Compiler to be used throughout the system' .

That's it, you're set. Happy hacking!

Monday, November 30, 2009

Code beauty, J vs. Mathematica

I drew the equivalent kernel of a smoothing spline in J.

Code:

t =: i: 10 j. 1000
y=:(%2) * (^-|t% (%2)) * sin(( (|t)* (%:2) % 2 )+ (pi%4))
plot t;y

This is the Mathematica version:

Sunday, November 29, 2009

The J programming language

Today I went through the first few tutorials of the J programming language. J is a treat. The human language has many devices that keep things short and explicit. While in context-oriented programming people are still wondering how to properly identify and model the "context" of a statement, natural language already goes and changes the behaviour of nouns by specifying them more precisely with adjectives. Compare that with modern object-oriented programming languages, which only have nouns (classes) and verbs (methods) at best.

The J language tries to harness the power of the natural languages by providing nouns, verbs, adverbs, conjunctions, and comparatives. I'd love to see an OO version with adjectives.

The tutorials are splendid and the IDE helpful, which I won't elaborate on because James Hague wrote such a nice piece about the fact. Oh, and he raves about the built-in plotting. Here's some turtle geometry, similar to an example that they provide:

The source code was:

show repeat 20;'point (n =: n+1) fd 1 rt 15'[n=: _2
   to 'goreturn a b';'goto b goto a'
   to 'goreturnall a'; 'goreturn for a,.i.20'
   show goreturnall for i.20

The article is over. Just one last thing: There's a funny glitch in the tutorials. The tutorials assume that you have some familiarity with Hilbert matrices and alternating binomial coefficients. But you are not expected to know how to handle a mouse:

The color window can be moved away from text that it may obscure by clicking on the top bar, holding the button down, and dragging it to a new position before releasing the mouse button –– Chapter 2, lesson 2 in the J tutorial

Saturday, November 21, 2009

And it's done, Phexample has no more lolcats

Image from Wikipedia, see http://en.wikipedia.org/wiki/File:Yet_another_lolcat.jpg for the author. CC license.

Lolcats have left the building, Phexample reads easier now.

Stack new isEmpty should not be true.

To get a meaningful error message out of this test, you're faced with a small dilemma. The Phexample framework lies in the should method, thus, for all that should sees, it is called on true, so the test might as well have been

true should not be true.

Now, to squeeze a reasonable error message out of this, such as "isEmpty should return false, but got true," it helps finding the unexecuted code snippet in the caller. Thus, the execution depends on who called you. Maybe not perfectly object-oriented, but context-oriented! There we have a new buzz-word: COP, context-oriented programming.

I learned a few things on the way: Bytecodes in Smalltalk have different sizes, thus you can't just walk backwards from where you are. The containing method must be read forwards, and once you hit the position of the current execution, you know you're a bit too far. Also, you find the current stack frame using the pseudo-variable thisContext, though there are some performance issues with it.

By the way, you can't really ice and copy the stack (someone forbade it in the source, for whatever reason), you're analyzing it as it is being used for the current computation. A bit scary, but it works quite nice. Pharo ftw! Try this in JavaScript!

Get the latest version of Phexample at SqueakSource (there's a Gofer-Script on that page that will do the downloading for you!.

Thursday, November 19, 2009

Phexample's lolcats syntax

In this post I'll clear my throat on what there is to come. The number one criticism of Phexample is that Phexample sometimes reads like lolcats, because real code that asserts that a stack be empty reads like

stack should not be isEmpty.

The complaint feels like an instance of the complaint of how stupid it is that in OSX you need to drag CDs to the waste bin to eject them, or how in Windows the bin stands on the desk. Of course the metaphor is broken there, but you only notice it because the metaphor works great.

And now you compare that to the usual alternative, which is assert. The first thing you know about assert is that nobody can remember which one is the expected and which is the actual value (I'll take this fact and some of the below discussion from Josh Graham's discussion of the assert syntax). We're talking about assertEquals(stack.isEmpty, true). Or the other way around. I certainly can't remember that either.

In Java, improvements have been suggested:
assertEquals(expected(true), actual(stack.isEmpty());

I find that while Phexample does not allow you to express your examples in a way that completely parses as natural language, it certainly reads easy enough and gives a natural and easy distinction between the expected and the actual.

So there's my analogy: no other testing framework is required to express its tests in a way that perfectly parses as human language. Only because the tests already look so close to perfection is it that people single out the mismatch and decry it. I don't think this is an interesting problem worth solving at all.

Yet, the criticism follows Phexample wherever it goes, so I'll try and allow phrases like

stack isEmpty should be true.

The difficulty in making the above code work is that the testing framework is only activated after isEmpty was evaluated already. Which means that it isn't in the stack anymore, so we need to analyze the source code to find out which method caused the failure to display a meaningful error message.

Wednesday, November 4, 2009

Telling other people how to install your code

Previously, I created dirty ScriptLoader tricks to make it easier for others to download my code.

I am pleased to find out that Gofer is even easier than my dirty tricks, well without being a dirty trick: Gofer on Lukas's blog.

Phexample, because examples expand on one another

Imagine you want to test the Stack class, and your first test just
creates a test and checks if it's empty.

Then in your second test, you create the same empty stack, but now
push an element 'apple' and check if the size is one.

Then in your third test, you do all the same, and then you pop the
element and check that it is an apple.

Wouldn't it be great if you could write your test cases such that they
could expand on one another? If you didn't need to copy paste the code
of the previous test and just say that you require the previous one!

Well, that's what PhExample lets you do. And as a bonus, tests are
ignored if the test they expand on does not pass. This gives you less
failed tests, because examples are executed only if the examples they
expand on pass too! That gives you the simplest failing examples in
your set to look at!

And best of all, you can run all examples with ye old TestRunner!

The example discussed above would read in code as follows:


EGExample subclass: #ForExampleStack
       …


shouldBeEmpty
       "Create the empty stack"
       | stack |
       stack := Stack new.
       stack should be isEmpty.
       stack size should = 0.
       ^ stack


shouldPushElement
       "Push one element"
       | stack |
       stack := self given: #shouldBeEmpty.
       stack push: 42.
       stack should not be isEmpty.
       stack size should = 1.
       stack top should = 42.
       ^ stack


shouldPopElement
       "And pop it again"
       | stack |
       stack := self given: #shouldPushElement.
       stack pop should = 42.
       stack should be isEmpty.
       stack size should = 0.

Find the code at http://www.squeaksource.com/phexample.html

You can install it as follows using Gofer:


 Gofer new
  squeaksource: 'phexample';
  addPackage: 'Phexample';
  load

The repository is write-global, so feel free to drop your ideas!

PhExample is based on JExample by Lea Haensenberger, Adrian Kuhn, and
Markus Gaelli. See JExample

Wednesday, September 30, 2009

Introducing ThothCompiler

ThothCompiler combines the strengths of the two most popular smalltalk compilers. It takes the extensibility and wealth of the AST implementation from the NewCompiler and combines it with the most important asset of the old Compiler that per default ships with Squeak: it actually works.

Installation

The installation is fantastically easy. Go to Thoth on Squeaksource and get ASTBridgeLoader.

In a workspace, then execute:

ASTBridgeLoader load

Add the method compilerClass to the CLASS side of your class to set the compiler for the class. Thoth installs itself as the default compiler of the image. If you're unhappy with that, for example because you don't like the lack of error messages when you accept uncompilable code, you can choose for each class which compiler is to be used. Behavior>compilerClasss sets the default.

How do I hack into it?

Subclass ThothCompiler and override #compile:in:classified:notifying:ifFail: and #evaluate:in:to:notifying:ifFail:logged: Transform the AST before generate is called. Perhaps I should provide some infrastructure here …

How it works

It is a really simple thing: It uses the parser of the NewCompiler, then transforms the AST into the old model, and then uses the backend of the old compiler to compile.

Wednesday, September 16, 2009

Blocks in Smalltalk can have their own temporary variables

Did you know that the following appears to be Smalltalk code? Runs fine in the current Pharo.

[:a | |b| 2] value: 3

Sunday, September 6, 2009

Explaining Mathematica with the Solution for Problem 59 from Project Euler

The solution to problem 59 is perhaps helpful to see what Mathematica is good at and how it basically works. The task is here. Let me summarize it. The task contains a link to a text file full of numbers. These numbers together are an encrypted text, the password being three lowercase letters. The decryption works like this: You convert the password into its ASCII values, and then XOR the first byte of the encrypted text with the ASCII value of the first letter in the password. You iterate this for all three letters, and then start over with the first letter of the password for the fourth number in the encrypted text. And so forth, until you are done. The task is to find the password.

Ok, so, the first thing Mathematica rocks at is importing. Here it probably isn't a big deal, but I find it noteworthy anyhow … getting the encrypted file is this:

text 
  = Import["http://projecteuler.net/project/cipher1.txt", "csv"]  
  //  First ;

Here, Import leaves a pair of braces around the real result, which the First function disposes of.

Now the idea of my checking is this. I'll take every third character from the encrypted file and store it in a list. Then I will try for each possible first letter of the password which letter would yield the most printable characters in the decrypted text. I will use this very narrow definition of a printable character:

 printable[a_] := 65 <= a <= 90 
  ~Or~ 97 <= a <= 122

One thing we cannot see here in Blogger is that Mathematica allows Mathematical typesetting, thus presenting this definition with a beautiful vee for an or and the inequalities look like inequalities.

The next task is to give each letter of the password a score. Let us assemble the procedure from the ground up. The goal is to build a function which takes every third element of the encrypted text, applies BitXor with the key, and then counts how many characters are displayable.

Getting every third element from the encrypted text, starting with the 1^st letter, is expressed as

text[[1 ;; ;; 3]]]

We can refer to the result of the previous computation as %. Let us try 103 as the first letter of the password, that would be the letter "g." Then we decrypt as follows:

BitXor[103,%].

And finally, we count the printable letters in the resulting list. Mathematica provides a Count function. Count is a nice function to demonstrate something rather unique to Mathematica: patterns. Let us first see our code, which counts printable letters:

Count[%, _?printable].

Here, _?printable is the pattern. The underscore stands for "anything", and the question mark specifies that the anything must evaluate to true when passed into the following function, printable. This is a very simple pattern, you can do cooler things, like the pattern {1,2,_}, which matches lists of three elements whose first two elements must be 1 and 2.

Now altogether the score function looks like this, where i ranges between 1 and 3:

score[i_, letter_] := 
 Count[BitXor[letter, text[[i ;; ;; 3]]] , _?printable]

The task states that valid candidates for the letters of the password must be lowercase. These have ASCII codes between 79 and 122. The function SelectMaximizer is an invention of mine. It chooses the element of a list which maximizes a function that can be supplied. The supplied function will be score, of course, so we can find the first letter as

SelectMaximizer[Range[97, 122], 
  score[1, #] &]

The confusing thing here is the closure, score[1, #] &. In Mathematica, everything that ends in an ampersand is a closure, or a local function. The hash symbol is the argument. We need it because score accepts two arguments (the position of the letter in the password, and the actual letter). Where we'd like to use only the second as an argument.

The function call will dutifully find the character "g" as the first letter of the password. To find all 3, we use the Table function. The Table function an expression with an undefined variable, i, and then tell it to use that variable as an iterator, assuming values from 1 to 3. The following code thus finds all three letters of the password:

password = 
  Table[ SelectMaximizer[Range[97, 122], 
  score[i, #] &], {i, 3}];

To decrypt the text with our password, we partition the encrypted text into runs of three, which we can then BitXor. Mathematica ships with a Partition function.

Partition[text, 3] returns a list partitions of length 3 of text text. Map function applies a closure to each element of a list. For example, Map[Range[4],#*#&] computes the first four square numbers.

Our first shot at a decoding function is thus:

Map[BitXor[password, #] & , Partition[text, 3]

We almost get it right. Unfortunately, the encrypted text's length is not divisible by three, and thus partition just 'swallows' the last letter. We can tell Partition to attach the remaining letter to the list. We use a little bit of trickery, because simply attaching the last letter would attach a list of length 1 to the list of partitions, but our partitions must have length 3 for the BitXor to work. Well, we can tell Partition to add the left-overs, and then pad to the length of 3 with the password itself, which will then become 0 after the Xor, so we're really appending null bytes to the string.

Together, the decoding function is now

decode[p_] := Map[BitXor[p, #] & ,
   Partition[text, 3, 3, 1, p]]

where p stands for password.

For what it's worth, the password is "god" and the result is chapter one of the Gospel of John. The task still requires us compute the sum of the ASCII codes of the decoded text. Our decode function really returns a nested list, to compute the sum of all entries, we could either use Flatten to reduce it to depth 1, or we could tell the Total function to operate on the second level of a nested list.

Total[decode[password], 2]

which returns 107359.

The complete code is

text = 
Import["http://projecteuler.net/project/cipher1.txt",
   "csv"] //  First ;

printable[a_] := 65 <= a <= 90 ~Or~  97 <= a <= 122

score[i_, letter_] := 
 Count[BitXor[letter, text[[i ;; ;; 3]]] , _?printable]

decode[p_] := Map[BitXor[p, #] & , 
  Partition[text, 3, 3, 1, p]]

password = 
  Table[ SelectMaximizer[Range[97, 122], 
  score[i, #] &], {i, 3}];

Total[decode[password], 2]

The complete code is can be downloaded as a complete code for problem 59 Mathematica notebook or as a complete code for problem 59 pdf file.

Friday, August 14, 2009

A Small Mathematica Vs C Benchmark

Sometimes I get frowned upon because I do expensive computations in Mathematica. Some expensive computations will be very fast in Mathematica, of course. Try computing (10^4)//Factorial.

But what about the cheap computations that you could as well do in C? The Euler Project gave me the change to try it out with an easy problem. I was solving problem 9 from project Euler, which in short searches for natural numbers a<b<c, such that a^2+b^2=c^2, and a+b+c=1000.

The C-program that finds these numbers is


int main(void) {
 for(int a = 1; a < 10000; a++) {
  for(int b = a + 1; b <= 10000; b++) {
   int c = 1000 - a - b;
   if(a * a + b * b == c * c) {
    printf("%d\n",a*b*c);
    return 0;
   }
  }
 }
}

Note that I de-optimized a little bit by bounding a and b in [1,10,000], which is ten times greater than it should be. The result is still correct, but now it's slow enough so that we can measure something.

The run-time in C is 0.016 seconds on my machine

Bringing this to Mathematica is more than straightforward:


Module[{a, b, c},
  For[a = 1, a <= 10000, a++,
   For[b = a + 1, b <= 10000, b++,
    c = 1000 - a - b;
    If[a a + b b == c c, Return[a b c]]
    ]
   ];
  0
  ][]

Now the run-time is at 8.5 seconds. But! Mathematica can compile! To something faster, that is. It's as easy as wrapping a Compile around the computation. We get:


Compile[{},
 Module[{a, b, c},
  For[a = 1, a <= 10000, a++,
   For[b = a + 1, b <= 10000, b++,
    c = 1000 - a - b;
    If[a a + b b == c c, Return[a b c]]
    ]
   ];
  0
  ]
 ]

And now we're down to 0.38 seconds. Just as a reminder:

Table 1. Run times of the Euler project 9 algorithm in seconds. M stands for Mathematica
C	uncompiled M	compiled M
0.16	8.53	0.38

Which brings me to the following conclusion: For basic arithmetic computation, Mathematica will slow you down by a factor 40, if you use it only casually, or by a factor of 2 if you have some experience with it (Not everything is nicely compilable, so this is a bit optimistic).

Mathematica taketh away, but it also giveth: The library functions are fast, and there's plenty of them. You can treat Mathematica just like any scripting language: If the operations in your inner loop are costly, Mathematica is good for you! Otherwise, you pay a price for a wide range of library-functions. And it is worth delving into that library.

Monday, July 6, 2009

Pharo – perhaps Squeak 2.0?

Saturday, two days ago, I had the pleasure of being present at the Berne Pharo sprint. It was not my first contact with Pharo, but close.

If I understand it correctly, then Pharo emerged because Squeak was torn into too many directions. Squeak was the one platform for child education, multimedia websites, and webserver development. Some code in the Squeak image was never released under a free licence, and the Squeak-licence itself is not an OSI-certified open-source licence. Squeak has more than 2000 open issues in its bugtracker, Mantis.

Pharo is a fork from Squeak that aims at removing cruft and turnung Squeak smalltalk into a more traditional programming environment, with a small core, stable and fast-evolving.

I am slightly suspicious about the reducing of the Image size. I don't know anything about it, but I still recall how in an interview, Dan Ingalls spoke about the advantages of an image that is NOT split into packages, where everything calls everything. I recall for myself that when I wrote a web application that was to send emails by itself, I was delighted to see that an SMTP client is included in a standard image already. It was fantastically easy to send an email message. On the other hand, I discovered at least one bug in the SMTP client code.

Still, I will use Pharo, and not Squeak, henceforward. The user interface is revised and does not look childish anymore. The number of open issues is small, and despite the alpha status, Pharo appears to be a lot more stable than Squeak. Pharo is already used as the reference implementation of Seaside, and thus, it is only a matter of time until it will draw a significant share of the Smalltalk world into using it for web development. Depending on how well it will manage to faciliate the installation of external packages, I would like to believe that for non-educational purposes, Pharo will take over all of the Squeak world, and serve as Squeak 2.0.

Tuesday, May 26, 2009

How fast is Comanche?

The other day I came across this guy who wants to serve webpages in less than a millisecond. Without parallelization. So, his web page is really simple, it looks like this: <h1>Hello TWUG!</h1>.

His point is that things like Twitter are difficult to scale across several machines. The slowness of dynamic pages, he claims, is tried to be made up by advanced caching strategies. He goes on to check how fast certain languages are at serving dynamic content. He gets these numbers:

Clearly, his machine is faster than mine, because serving the same file statically, my Apache (single-threaded using httpd -X) can do only 1850 requests per second, that is 0.5 ms per request. (I use this benchmark: httperf --hog --num-conn 10000 --uri http://localhost:9090/?name=Niko). Here are the numbers I get:

Anyway, I find interesting to mention that Comanche is not far from Apache. Comanche can serve the same page dynamically (the name is a GET parameter) in 1.1 ms, for which Apache takes statically 0.5 ms. As for Apache-php vs. Comanche, I'd call it a tie.

The Squeak code for the Http server would be this:

ma := ModuleAssembly core.
ma addPlug: [:request | HttpResponse fromString: 
   (String streamContents: [:s| 
   s nextPutAll:'<h1>Hello! '; 
   nextPutAll:( request  fields at: 'name'); 
   nextPutAll: '</h1>']).].
(HttpService startOn: 9090 named: 'Example') module: ma rootModule

By this way, as the above snippet suggests, a standard Seaside installation for Squeak is quite able to make RESTful applications. Just avoid the Seaside framework and plug right into the Comanche web server.

For completeness, the PHP-code:

<h1>Hello <?=$_REQUEST['name']?></h1>

Update

By the way, this seems to be about 10 % faster:

s:=(HttpService on: 8080 named: 'Example Http Service').
s onRequestDo: [ :request |
 HttpResponse fromString: 
  (String streamContents: [:s| s nextPutAll:'<h1>Hello! '; 
  nextPutAll:( request  fields at: 'name'); 
  nextPutAll: '</h1>']) ].
s start

Thursday, May 21, 2009

Adding new properties to a class

When you load a new toy into Squeak and toy with it in a workspace, that is usually fun. So much more fun than creating a new class. I always suspect that I use the Squeak tools all wrong, but honestly, to create a new class with a couple of properties, it takes me fivethousand clicks to set all the instance variables, set the accessors and get them all properly initialized.

So, I still don't know if there might not be a much easier way to do it, but I'm fine with what I've dreamt up today. If I want to add a new instance variable to a class, including all accessors, I call this from a workspace:

ManyManyRelation addPropertyName: #oneLot.

And if I want to initialize the property to something, I use this:

ManyManyRelation addPropertyName: #otherLot lazyInitializeWith: [Set new].

The following method goes to the instance side of Class:

addPropertyName: aSymbol
  self addInstVarName: aSymbol.
  self compileAccessorsFor: aSymbol

And then:

compileAccessorsFor: aSymbol

"Compile instance-variable accessor methods
  for the given variable name "
   self compileSilently: aSymbol asString , '
       ^ ' , aSymbol classified: 'access'.
   self compileSilently: aSymbol asString , ': anObject' 
        , aSymbol , ' := anObject' classified: 'access'

You can now call something like ManyManyRelation addPropertyName: #oneLot, and it adds an instance variable to ManyManyRelation, including both accessors.

Let's push it one step further. Hooking into initialize really doesn't feel right––if objects are about modelling the real world, then what exactly corresponds to object initialization? While still not being optimal, the following creates lazy initizers in the accessors.

The following two methods go as instance methods into Class:

addPropertyName: aSymbol lazyInitializeWith: aBlock
   self addInstVarName: aSymbol.
   self compileAccessorsFor: aSymbol lazyInitializeWith: aBlock

compileAccessorsFor: aSymbol lazyInitializeWith: aBlock 
    | initializeCommand decompileString |
    decompileString := aBlock decompileString.
    initializeCommand := decompileString copyFrom: 
                2 to: decompileString size - 1.
    self compileSilently: aSymbol asString , '
        ^ ' , aSymbol , ' ifNil: [ ' , aSymbol , 
          ' := ' , initializeCommand , ']'  classified: 'access'.
    self compileSilently: aSymbol asString , ': anObject
    ' , aSymbol , ' := anObject' classified: 'access'

You use it by calling ManyManyRelation addPropertyName: #otherLot lazyInitializeWith: [Set new].

By the way, I would refrain from using (CreateAccessorsForVariableRefactoring variable: #hasMany class: OneManyRelation classVariable: false) execute, which is what happens when you create accessors using OmniBrowsers, the reason being it behaves strangely when the object already responds to a message, most typically like name.

Sunday, May 17, 2009

Use relational databases and lock yourself out from your own data

Searching a certain pattern in a graph of your data is the most exciting task. I learned that by writing a Master's thesis on algorithms that solve NP-hard problems. I know that years of research have gone into people trying to come up with smart algorithms that try to answer really difficult questions on graphs. I also know that researchers carefully take care that each of these graph problems really stands for a computation problem of great significance to real life computing. Graph algorithms help you find travelling routes, understand the preferences of your customers, and help you find the optimal arrangement of your office.

I am talking about graphs defined by your data. If your application has interesting objects, then maybe they form an interesting graph and perhaps this way of thinking brings you a fantastically fast algorithm that serves your business logic.

If you have an interesting query on your data, it may not even be POSSIBLE to formulate the query in SQL. Such a query might be: I have a band, give me the cluster of similar bands!

A relational database promises you to solve all your data searching problems if you give it all your data. That's a REALLY big promise. In fact it's esoteric. Why would a search path optimizer be able to do what otherwise researchers write papers on? If you have an interesting SQL query, then I bet you that your database is rather slow in answering it. Perhaps it hides it by computing the answer to the query long before the query actually shows up, but the cost is still quite high.

For interesting queries on your data, I suggest you do not trust the esoteric and thoroughly overrated RDBMS query optimizers. Trust your brain and come up with a smart graph algorithm. If you use a relational database, you have locked yourself out from your own data. That is because getting a single object that equals one data row out in a RDBMS takes typically about 20 ms. If you navigate through the graph of your objects and each step costs you 20 ms, you know just exactly how many steps through your graph are still ok if you plan to deliver the rendered website in less than a second.

Object databases, instead, show a fantastic speed when you are navigating through the object graph. Give object databases a shot and find your web application soaring all of a sudden. And let's stop pretending that RDMBSs are easy. I've spent more time than I'd like to admit pondering about the access paths that a query optimizer chose and just how exactly to nudge it to to something smarter. Which index to create? How much memory for what? RDBMSs don't think for you, they force to first ponder a smart access strategy and then ponder how to nudge the RDBMS into taking it.

Relational databases are slothful, bulky, and overall a great waste of time. Yours and the computer's. I avoid them wherever I can and I recommend you to do the same.

Thursday, May 14, 2009

SandstoneGOODS

SandstoneGOODS brings SandstoneDB (you might have to click that link twice) a backend to the GOODS object database. Using SandstoneGOODS, you can have the easiest thinkable persistence and transaction safety for your objects coupled with reasonable performance and scalability in Squeak Smalltalk.

SandstoneDB is the coolest OODBMS I am aware of, but until now there were only two ways of using it: either from Squeak using only one Squeak instance, or from a full-blown Gemstone. SandstoneGOODS allows to use SandstoneDB and one session per client in squeak seaside, which, some argue, is a very performant way to host Seaside web pages.

So, how does it work?

get a working copy of GOODS from http://www.garret.ru/goods.html and install it.
Put the following file into the directory where you want the database to reside:
```
1
0: localhost:6000
```
and call this file goods.cfg.
Enter a shell and navigate to the directory where you put the goods.cfg file. Execute goodsrv goods
Start up Squeak and install SandstoneGOODS (and all its requirements, namely GOODS, BTree, SandstoneDb).
shut down the image and restart it
You're ready to go! Use SandstoneDb as described by Ramon Leon

Performance

I can execute 1000 small commits in 30 seconds:

[ 1000 timesRepeat:[ SDChildMock new save] ] timeToRun.

I can execute one commit of 1000 small objects in 3 seconds:

[ SDActiveRecord commit:[ 1000 timesRepeat:[ SDChildMock new save ] ] ] timeToRun.

Read speed is excellent: reading 5000 objects from the db can be done in 1 second:

[SDChildMock findAll ] timeToRun--and once they're in cache, they are read in 0.1 seconds.

Altogether, I think SandstoneGOODS totally rocks and you can write your whole app using SandstoneGOODS faster than it took you before to write the ORM mapper definitions. Please tell me any bugs you find at niko.schwarz@googlemail.com. The repository allows public writing. You can find the sourcode on squeaksource.