Sunday, May 17, 2009

Use relational databases and lock yourself out from your own data

Searching a certain pattern in a graph of your data is the most exciting task. I learned that by writing a Master's thesis on algorithms that solve NP-hard problems. I know that years of research have gone into people trying to come up with smart algorithms that try to answer really difficult questions on graphs. I also know that researchers carefully take care that each of these graph problems really stands for a computation problem of great significance to real life computing. Graph algorithms help you find travelling routes, understand the preferences of your customers, and help you find the optimal arrangement of your office.

I am talking about graphs defined by your data. If your application has interesting objects, then maybe they form an interesting graph and perhaps this way of thinking brings you a fantastically fast algorithm that serves your business logic.

If you have an interesting query on your data, it may not even be POSSIBLE to formulate the query in SQL. Such a query might be: I have a band, give me the cluster of similar bands!

A relational database promises you to solve all your data searching problems if you give it all your data. That's a REALLY big promise. In fact it's esoteric. Why would a search path optimizer be able to do what otherwise researchers write papers on? If you have an interesting SQL query, then I bet you that your database is rather slow in answering it. Perhaps it hides it by computing the answer to the query long before the query actually shows up, but the cost is still quite high.

For interesting queries on your data, I suggest you do not trust the esoteric and thoroughly overrated RDBMS query optimizers. Trust your brain and come up with a smart graph algorithm. If you use a relational database, you have locked yourself out from your own data. That is because getting a single object that equals one data row out in a RDBMS takes typically about 20 ms. If you navigate through the graph of your objects and each step costs you 20 ms, you know just exactly how many steps through your graph are still ok if you plan to deliver the rendered website in less than a second.

Object databases, instead, show a fantastic speed when you are navigating through the object graph. Give object databases a shot and find your web application soaring all of a sudden. And let's stop pretending that RDMBSs are easy. I've spent more time than I'd like to admit pondering about the access paths that a query optimizer chose and just how exactly to nudge it to to something smarter. Which index to create? How much memory for what? RDBMSs don't think for you, they force to first ponder a smart access strategy and then ponder how to nudge the RDBMS into taking it.

Relational databases are slothful, bulky, and overall a great waste of time. Yours and the computer's. I avoid them wherever I can and I recommend you to do the same.

2 comments:

  1. There is a reply to this post here.

    ReplyDelete
  2. I think your criticisms are really with SQL and not the relational model of data which is based on both predicate logic and mathematics. SQL is not faithful to the relational model for more reasons than can be written in a short comment. In the original model as envisioned by Codd the logical aspect of the DB is separated from its physical implementation not like SQL which confuses the logical and physical. Google for Professor Hugh Darwen's web page on the third manifesto. It is a wealth of information about the original model.

    ReplyDelete

Note: Only a member of this blog may post a comment.