Niko Schwarz's science and programming: September 2009

ThothCompiler combines the strengths of the two most popular smalltalk compilers. It takes the extensibility and wealth of the AST implementation from the NewCompiler and combines it with the most important asset of the old Compiler that per default ships with Squeak: it actually works.

Installation

The installation is fantastically easy. Go to Thoth on Squeaksource and get ASTBridgeLoader.

In a workspace, then execute:

ASTBridgeLoader load

Add the method compilerClass to the CLASS side of your class to set the compiler for the class. Thoth installs itself as the default compiler of the image. If you're unhappy with that, for example because you don't like the lack of error messages when you accept uncompilable code, you can choose for each class which compiler is to be used. Behavior>compilerClasss sets the default.

How do I hack into it?

Subclass ThothCompiler and override #compile:in:classified:notifying:ifFail: and #evaluate:in:to:notifying:ifFail:logged: Transform the AST before generate is called. Perhaps I should provide some infrastructure here …

How it works

It is a really simple thing: It uses the parser of the NewCompiler, then transforms the AST into the old model, and then uses the backend of the old compiler to compile.

The solution to problem 59 is perhaps helpful to see what Mathematica is good at and how it basically works. The task is here. Let me summarize it. The task contains a link to a text file full of numbers. These numbers together are an encrypted text, the password being three lowercase letters. The decryption works like this: You convert the password into its ASCII values, and then XOR the first byte of the encrypted text with the ASCII value of the first letter in the password. You iterate this for all three letters, and then start over with the first letter of the password for the fourth number in the encrypted text. And so forth, until you are done. The task is to find the password.

Ok, so, the first thing Mathematica rocks at is importing. Here it probably isn't a big deal, but I find it noteworthy anyhow … getting the encrypted file is this:

text 
  = Import["http://projecteuler.net/project/cipher1.txt", "csv"]  
  //  First ;

Here, Import leaves a pair of braces around the real result, which the First function disposes of.

Now the idea of my checking is this. I'll take every third character from the encrypted file and store it in a list. Then I will try for each possible first letter of the password which letter would yield the most printable characters in the decrypted text. I will use this very narrow definition of a printable character:

 printable[a_] := 65 <= a <= 90 
  ~Or~ 97 <= a <= 122

One thing we cannot see here in Blogger is that Mathematica allows Mathematical typesetting, thus presenting this definition with a beautiful vee for an or and the inequalities look like inequalities.

The next task is to give each letter of the password a score. Let us assemble the procedure from the ground up. The goal is to build a function which takes every third element of the encrypted text, applies BitXor with the key, and then counts how many characters are displayable.

Getting every third element from the encrypted text, starting with the 1^st letter, is expressed as

text[[1 ;; ;; 3]]]

We can refer to the result of the previous computation as %. Let us try 103 as the first letter of the password, that would be the letter "g." Then we decrypt as follows:

BitXor[103,%].

And finally, we count the printable letters in the resulting list. Mathematica provides a Count function. Count is a nice function to demonstrate something rather unique to Mathematica: patterns. Let us first see our code, which counts printable letters:

Count[%, _?printable].

Here, _?printable is the pattern. The underscore stands for "anything", and the question mark specifies that the anything must evaluate to true when passed into the following function, printable. This is a very simple pattern, you can do cooler things, like the pattern {1,2,_}, which matches lists of three elements whose first two elements must be 1 and 2.

Now altogether the score function looks like this, where i ranges between 1 and 3:

score[i_, letter_] := 
 Count[BitXor[letter, text[[i ;; ;; 3]]] , _?printable]

The task states that valid candidates for the letters of the password must be lowercase. These have ASCII codes between 79 and 122. The function SelectMaximizer is an invention of mine. It chooses the element of a list which maximizes a function that can be supplied. The supplied function will be score, of course, so we can find the first letter as

SelectMaximizer[Range[97, 122], 
  score[1, #] &]

The confusing thing here is the closure, score[1, #] &. In Mathematica, everything that ends in an ampersand is a closure, or a local function. The hash symbol is the argument. We need it because score accepts two arguments (the position of the letter in the password, and the actual letter). Where we'd like to use only the second as an argument.

The function call will dutifully find the character "g" as the first letter of the password. To find all 3, we use the Table function. The Table function an expression with an undefined variable, i, and then tell it to use that variable as an iterator, assuming values from 1 to 3. The following code thus finds all three letters of the password:

password = 
  Table[ SelectMaximizer[Range[97, 122], 
  score[i, #] &], {i, 3}];

To decrypt the text with our password, we partition the encrypted text into runs of three, which we can then BitXor. Mathematica ships with a Partition function.

Partition[text, 3] returns a list partitions of length 3 of text text. Map function applies a closure to each element of a list. For example, Map[Range[4],#*#&] computes the first four square numbers.

Our first shot at a decoding function is thus:

Map[BitXor[password, #] & , Partition[text, 3]

We almost get it right. Unfortunately, the encrypted text's length is not divisible by three, and thus partition just 'swallows' the last letter. We can tell Partition to attach the remaining letter to the list. We use a little bit of trickery, because simply attaching the last letter would attach a list of length 1 to the list of partitions, but our partitions must have length 3 for the BitXor to work. Well, we can tell Partition to add the left-overs, and then pad to the length of 3 with the password itself, which will then become 0 after the Xor, so we're really appending null bytes to the string.

Together, the decoding function is now

decode[p_] := Map[BitXor[p, #] & ,
   Partition[text, 3, 3, 1, p]]

where p stands for password.

For what it's worth, the password is "god" and the result is chapter one of the Gospel of John. The task still requires us compute the sum of the ASCII codes of the decoded text. Our decode function really returns a nested list, to compute the sum of all entries, we could either use Flatten to reduce it to depth 1, or we could tell the Total function to operate on the second level of a nested list.

Total[decode[password], 2]

which returns 107359.

The complete code is

text = 
Import["http://projecteuler.net/project/cipher1.txt",
   "csv"] //  First ;

printable[a_] := 65 <= a <= 90 ~Or~  97 <= a <= 122

score[i_, letter_] := 
 Count[BitXor[letter, text[[i ;; ;; 3]]] , _?printable]

decode[p_] := Map[BitXor[p, #] & , 
  Partition[text, 3, 3, 1, p]]

password = 
  Table[ SelectMaximizer[Range[97, 122], 
  score[i, #] &], {i, 3}];

Total[decode[password], 2]

The complete code is can be downloaded as a complete code for problem 59 Mathematica notebook or as a complete code for problem 59 pdf file.

Niko Schwarz's science and programming

Wednesday, September 30, 2009

Introducing ThothCompiler

Installation

How do I hack into it?

How it works

Wednesday, September 16, 2009

Blocks in Smalltalk can have their own temporary variables

Sunday, September 6, 2009

Explaining Mathematica with the Solution for Problem 59 from Project Euler