Niko Schwarz's science and programming: May 2009

Tuesday, May 26, 2009

How fast is Comanche?

The other day I came across this guy who wants to serve webpages in less than a millisecond. Without parallelization. So, his web page is really simple, it looks like this: <h1>Hello TWUG!</h1>.

His point is that things like Twitter are difficult to scale across several machines. The slowness of dynamic pages, he claims, is tried to be made up by advanced caching strategies. He goes on to check how fast certain languages are at serving dynamic content. He gets these numbers:

Clearly, his machine is faster than mine, because serving the same file statically, my Apache (single-threaded using httpd -X) can do only 1850 requests per second, that is 0.5 ms per request. (I use this benchmark: httperf --hog --num-conn 10000 --uri http://localhost:9090/?name=Niko). Here are the numbers I get:

Anyway, I find interesting to mention that Comanche is not far from Apache. Comanche can serve the same page dynamically (the name is a GET parameter) in 1.1 ms, for which Apache takes statically 0.5 ms. As for Apache-php vs. Comanche, I'd call it a tie.

The Squeak code for the Http server would be this:

ma := ModuleAssembly core.
ma addPlug: [:request | HttpResponse fromString: 
   (String streamContents: [:s| 
   s nextPutAll:'<h1>Hello! '; 
   nextPutAll:( request  fields at: 'name'); 
   nextPutAll: '</h1>']).].
(HttpService startOn: 9090 named: 'Example') module: ma rootModule

By this way, as the above snippet suggests, a standard Seaside installation for Squeak is quite able to make RESTful applications. Just avoid the Seaside framework and plug right into the Comanche web server.

For completeness, the PHP-code:

<h1>Hello <?=$_REQUEST['name']?></h1>

Update

By the way, this seems to be about 10 % faster:

s:=(HttpService on: 8080 named: 'Example Http Service').
s onRequestDo: [ :request |
 HttpResponse fromString: 
  (String streamContents: [:s| s nextPutAll:'<h1>Hello! '; 
  nextPutAll:( request  fields at: 'name'); 
  nextPutAll: '</h1>']) ].
s start

Thursday, May 21, 2009

Adding new properties to a class

When you load a new toy into Squeak and toy with it in a workspace, that is usually fun. So much more fun than creating a new class. I always suspect that I use the Squeak tools all wrong, but honestly, to create a new class with a couple of properties, it takes me fivethousand clicks to set all the instance variables, set the accessors and get them all properly initialized.

So, I still don't know if there might not be a much easier way to do it, but I'm fine with what I've dreamt up today. If I want to add a new instance variable to a class, including all accessors, I call this from a workspace:

ManyManyRelation addPropertyName: #oneLot.

And if I want to initialize the property to something, I use this:

ManyManyRelation addPropertyName: #otherLot lazyInitializeWith: [Set new].

The following method goes to the instance side of Class:

addPropertyName: aSymbol
  self addInstVarName: aSymbol.
  self compileAccessorsFor: aSymbol

And then:

compileAccessorsFor: aSymbol

"Compile instance-variable accessor methods
  for the given variable name "
   self compileSilently: aSymbol asString , '
       ^ ' , aSymbol classified: 'access'.
   self compileSilently: aSymbol asString , ': anObject' 
        , aSymbol , ' := anObject' classified: 'access'

You can now call something like ManyManyRelation addPropertyName: #oneLot, and it adds an instance variable to ManyManyRelation, including both accessors.

Let's push it one step further. Hooking into initialize really doesn't feel right––if objects are about modelling the real world, then what exactly corresponds to object initialization? While still not being optimal, the following creates lazy initizers in the accessors.

The following two methods go as instance methods into Class:

addPropertyName: aSymbol lazyInitializeWith: aBlock
   self addInstVarName: aSymbol.
   self compileAccessorsFor: aSymbol lazyInitializeWith: aBlock

compileAccessorsFor: aSymbol lazyInitializeWith: aBlock 
    | initializeCommand decompileString |
    decompileString := aBlock decompileString.
    initializeCommand := decompileString copyFrom: 
                2 to: decompileString size - 1.
    self compileSilently: aSymbol asString , '
        ^ ' , aSymbol , ' ifNil: [ ' , aSymbol , 
          ' := ' , initializeCommand , ']'  classified: 'access'.
    self compileSilently: aSymbol asString , ': anObject
    ' , aSymbol , ' := anObject' classified: 'access'

You use it by calling ManyManyRelation addPropertyName: #otherLot lazyInitializeWith: [Set new].

By the way, I would refrain from using (CreateAccessorsForVariableRefactoring variable: #hasMany class: OneManyRelation classVariable: false) execute, which is what happens when you create accessors using OmniBrowsers, the reason being it behaves strangely when the object already responds to a message, most typically like name.

Sunday, May 17, 2009

Use relational databases and lock yourself out from your own data

Searching a certain pattern in a graph of your data is the most exciting task. I learned that by writing a Master's thesis on algorithms that solve NP-hard problems. I know that years of research have gone into people trying to come up with smart algorithms that try to answer really difficult questions on graphs. I also know that researchers carefully take care that each of these graph problems really stands for a computation problem of great significance to real life computing. Graph algorithms help you find travelling routes, understand the preferences of your customers, and help you find the optimal arrangement of your office.

I am talking about graphs defined by your data. If your application has interesting objects, then maybe they form an interesting graph and perhaps this way of thinking brings you a fantastically fast algorithm that serves your business logic.

If you have an interesting query on your data, it may not even be POSSIBLE to formulate the query in SQL. Such a query might be: I have a band, give me the cluster of similar bands!

A relational database promises you to solve all your data searching problems if you give it all your data. That's a REALLY big promise. In fact it's esoteric. Why would a search path optimizer be able to do what otherwise researchers write papers on? If you have an interesting SQL query, then I bet you that your database is rather slow in answering it. Perhaps it hides it by computing the answer to the query long before the query actually shows up, but the cost is still quite high.

For interesting queries on your data, I suggest you do not trust the esoteric and thoroughly overrated RDBMS query optimizers. Trust your brain and come up with a smart graph algorithm. If you use a relational database, you have locked yourself out from your own data. That is because getting a single object that equals one data row out in a RDBMS takes typically about 20 ms. If you navigate through the graph of your objects and each step costs you 20 ms, you know just exactly how many steps through your graph are still ok if you plan to deliver the rendered website in less than a second.

Object databases, instead, show a fantastic speed when you are navigating through the object graph. Give object databases a shot and find your web application soaring all of a sudden. And let's stop pretending that RDMBSs are easy. I've spent more time than I'd like to admit pondering about the access paths that a query optimizer chose and just how exactly to nudge it to to something smarter. Which index to create? How much memory for what? RDBMSs don't think for you, they force to first ponder a smart access strategy and then ponder how to nudge the RDBMS into taking it.

Relational databases are slothful, bulky, and overall a great waste of time. Yours and the computer's. I avoid them wherever I can and I recommend you to do the same.

Thursday, May 14, 2009

SandstoneGOODS

SandstoneGOODS brings SandstoneDB (you might have to click that link twice) a backend to the GOODS object database. Using SandstoneGOODS, you can have the easiest thinkable persistence and transaction safety for your objects coupled with reasonable performance and scalability in Squeak Smalltalk.

SandstoneDB is the coolest OODBMS I am aware of, but until now there were only two ways of using it: either from Squeak using only one Squeak instance, or from a full-blown Gemstone. SandstoneGOODS allows to use SandstoneDB and one session per client in squeak seaside, which, some argue, is a very performant way to host Seaside web pages.

So, how does it work?

get a working copy of GOODS from http://www.garret.ru/goods.html and install it.
Put the following file into the directory where you want the database to reside:
```
1
0: localhost:6000
```
and call this file goods.cfg.
Enter a shell and navigate to the directory where you put the goods.cfg file. Execute goodsrv goods
Start up Squeak and install SandstoneGOODS (and all its requirements, namely GOODS, BTree, SandstoneDb).
shut down the image and restart it
You're ready to go! Use SandstoneDb as described by Ramon Leon

Performance

I can execute 1000 small commits in 30 seconds:

[ 1000 timesRepeat:[ SDChildMock new save] ] timeToRun.

I can execute one commit of 1000 small objects in 3 seconds:

[ SDActiveRecord commit:[ 1000 timesRepeat:[ SDChildMock new save ] ] ] timeToRun.

Read speed is excellent: reading 5000 objects from the db can be done in 1 second:

[SDChildMock findAll ] timeToRun--and once they're in cache, they are read in 0.1 seconds.

Altogether, I think SandstoneGOODS totally rocks and you can write your whole app using SandstoneGOODS faster than it took you before to write the ORM mapper definitions. Please tell me any bugs you find at niko.schwarz@googlemail.com. The repository allows public writing. You can find the sourcode on squeaksource.