Archive for December 2008

 
 

Morning coffee, 15 lines

This morning I decided to use the reduce function in lieu of the missing ?? operator I talked about yesterday. This shaved off another line:

#light

open System.Text.RegularExpressions

let (+..) = Seq.append

let edits1 (w:string) =
    seq { for i in { 0 .. w.Length - 1} -> w.Remove(i, 1) } +..
    seq { for i in { 0 .. w.Length - 2 } -> w.Remove(i + 1, 1).Insert(i, w.[i + 1].ToString()) } +..
    seq { for i in { 0 .. w.Length - 1 } do for c in { 'a' .. 'z' } -> w.Remove(i, 1).Insert (i, c.ToString()) } +..
    seq { for i in { 0 .. w.Length } do for c in { 'a' .. 'z' } -> w.Insert (i, c.ToString()) }

let re = Regex ("[a-zA-Z]+", RegexOptions.Compiled)
let words = System.IO.File.ReadAllText "big.txt" |> re.Matches |> Seq.cast |> Seq.map (fun (x:Match) -> x.Value.ToLower())
let wordMap = words |> Seq.count_by id |> Map.of_seq

for arg in Seq.skip 1 Sys.argv do
    let cor = [arg |> Seq.singleton; edits1 arg; edits1 arg |> Seq.map_concat edits1]
              |> Seq.map (Seq.filter wordMap.ContainsKey) |> Seq.reduce (fun x y -> if Seq.is_empty x then y else x)
    printf "%s\n" (if Seq.is_empty cor then arg else cor |> Seq.max_by (fun w -> wordMap.[w]))

Changes from yesterday’s code:

1. Simplified the edits1 function using string Insert and Remove functions. This makes the lines shorter.

2. The cascading cor1 –> cor2 –> cor3 has been replaced by a call to reduce.

3. Replaced the call to Seq.sort_by followed by a call to Seq.hd by Seq.max_by. I’m a moron.

4. The program now attempts to correct all passed arguments, one by one. It bugged me that the code would break for no command-line arguments because it was accessing Sys.argv.[1].

Notes:

1. The edits1 function can definitely be shaved by one line. But limitations of either F# or my knowledge are preventing from doing it in an elegant manner. To be continued…

2. In addition to standard implicit conversions (int to float, etc.), all languages should have an implicit conversion from char to string. I know that the ML language family eschews implicit conversions because of type inference, but it still strikes me as wrong. C++ has shown that it’s possible to have both implicit conversions and type inference (which it does for generic function calls), and though it does one or the other and the Koeing lookup complicates the language definition, it’s still both useful and safe.

Norvig’s spelling corrector in 16 lines

[Update: I’ve trimmed the code to 15 lines. See the next post.]

I need to confess a dirty secret of mine: I like language shoot-outs.

image I know they are not very scientific. I know they don’t show real-world code. I know that the results depend on the skill of individual contributors. But whether shoot-outs compare speed or brevity I still love them, for a simple reason: They let me see a lot of languages in action at once. If you look at the results as an order of magnitude thing rather than as an exact comparison, they become telling.

Peter Norvig (of the Teach Yourself Programming in Ten Years fame) recently posted a rough explanation of how Google’s spelling corrector works. Since I did something similar ten ago with the Serbian language (it’s depressing how statistics on large input samples can beat experts), I found this interesting.

As it goes on the internets, a lot of people couldn’t resist and rewrote Peter’s sample in their language of choice. While looking at the results it struck me as wrong that F# would do so much worse than C#. So when Ana and the kids went to bed last night I rewrote Lorenzo Stoakes’s C# code in sixteen lines of F#:

#light

open System.Text.RegularExpressions

let (+..) = Seq.append

let edits1 (w:string) =
    seq { for i in { 0 .. w.Length - 1} -> w.Substring (0, i) + w.Substring (i + 1) } +..
    seq { for i in { 0 .. w.Length - 2 } -> w.Substring(0, i) + w.Substring(i + 1, 1) + w.Substring(i, 1) + w.Substring(i + 2) } +..
    seq { for i in { 0 .. w.Length - 1 } do for c in { 'a' .. 'z' } -> w.Substring(0, i) + c.ToString() + w.Substring(i + 1) } +..
    seq { for i in { 0 .. w.Length - 1 } do for c in { 'a' .. 'z' } -> w.Substring(0, i) + c.ToString() + w.Substring(i) }

let re = Regex ("[a-zA-Z]+", RegexOptions.Compiled)
let words = System.IO.File.ReadAllText "big.txt" |> re.Matches |> Seq.cast |> Seq.map (fun (x:Match) -> x.Value.ToLower())
let wordMap = words |> Seq.count_by id |> Map.of_seq
let arg = Sys.argv.[1]

let cor1 = arg |> Seq.singleton |> Seq.filter wordMap.ContainsKey
let cor2 = if Seq.is_empty cor1 then edits1 arg |> Seq.filter wordMap.ContainsKey else cor1
let cor3 = if Seq.is_empty cor2 then edits1 arg |> Seq.map_concat edits1 |> Seq.filter wordMap.ContainsKey else cor2

(if Seq.is_empty cor3 then arg else cor3 |> Seq.sort_by (fun w -> - wordMap.[w]) |> Seq.hd) |> printf "%s\n" 

A simple run from the command line shows it in action:

PS C:\vsp\spellfs\bin\Debug> ./spellfs droylea
doyle

Some notes, in no particular order:

1. I have a nagging feeling that the edits1 function can be rewritten to be much shorter, but I can’t put my finger on it just yet.

2. The cor1->cor2->cor3 sequence makes me go yuck. I need to check whether the F# standard library has an equivalent of the C# ?? operator that would also work on sequences. I could add it here but it would increase the line count. (Lousy measure, I know, but I’m a competitive bastard.)

3. Somebody needs to rewrite this in one line of K and the go “nya nya nya” to all of us.

4. The recipe for winning the brevity shoot-outs is quite simple: a) be functional, b) have a large standard library with a lot of data structures and a shitload of algorithms, and c) have a lot of operators. If you don’t overdo it this is also a good recipe for succeeding in the real world. A lot of places in the code and not performance-critical.

I, for one, welcome our ray-traced overlords

Prompted by a blog post I watched Resident Evil: Degeneration last night. The computer-generated movie is so-so, so despite the current astronomical 9.0 rating I can’t really recommend it. However, what impressed me was the quality of the animation.

In some short scenes you can’t tell it’s computer-generated. I could show you a frame from this movie and a frame from the “normal” Resident Evil movie and you couldn’t tell which one had real humans and which one had computer-generated ones.

The two glaring problems were the human skin and walking. Other than that the level of almost-realistic detail is amazing.

So what happens in ten or twenty years when the computing power goes up between thousand and a million times? (And remember, ray-tracing and movie frames are easy to parallelize so the Moore’s law is going to hold for them.) Add technologies like this or this and it’s clear that both the quality of facial animations and textures will dramatically improve.

Which leads me to the following question: will there still be a need to actors in twenty years? Will the movie studios generate their own actors instead of paying them $20 million a pop even for sequels? They’ll probably still need somebody actor-ish to come up with distinct character personas (think Brad Pitt in 12 Monkeys or Heath Ledger in the Dark Knight), but it can be a fat bald guy instead of a rare individual that has both the good looks and the acting skill