Monday, June 1, 2009

Scalable Algorithms

It's common knowledge that the best way of learning a new language is writing a program you wanted in it. Or, perhaps I should say, that's the quickest way of learning a language -- you should beware of the gaps in knowledge of the language that can result from this approach.

Well, I was trying to do that with Scala, but I always stopped because there was something more I wanted to learn before continuing with one project or other. It was then that a friend posed a challenge, and that finally broke the programmer's block I was having.

He has written a small program -- in Scala -- with few but loyal followers to help in his gaming sessions, and he wanted something to help with fast keyboard selection. His idea was having an algorithm that would search available menu options based on a few input characters, which would typically stand for an abbreviation of that menu.

As I took the challenge, I formally specified the algorithm requirement to the following: given a list of phrases (representing menu options) and a string of characters (representing user input), select all phrases for which exists a combination of prefixes of arbitrary length whose concatenation equals the string of characters.

Preferably, order the result too using some kind of heuristics.

Well, I wound up writing many solutions to the problem. Of course, there are MANY solutions to the problem -- my friend himself wrote something completely different from any of mine. But I discovered real joy in programming with Scala as I wrote them. So, let me share them with you.

List decomposition

The first concept I came up with was essentially based on lists. Scala has brought me back with full force to the lists algorithms I used once with functional languages.

I'll follow the code with my own thoughts about it. But, first, I want to explain the basic concept. First, I transform each phrase into a list of words, each word being a list of characters. The user input also gets transformed into a list of characters. Next, for each phrase I try possible matches against the input, and return a score.

This is where it gets recursive. After matching each character to the first character of the first word in the list, I select the better of the next two possible options: matching the next character in the word, or the first character in the next word.

The heuristics I used here gives preference first for matches that consume the whole input, and, next, to matches that match most of the words in the input. As such, and keeping in mind abbreviations are the most likely input, I use a function which receives the recursion as by name parameters, and only computes the second option if the first one doesn't return a perfect match (all input consumed, no words left unmatched).

Different from the algorithms wrote next, you can't skip words when matching. Also note I severely shortened the comments, as they got a bit too lengthy for a blog.

// Given two tuples formed by the length of unmatched input and
// the number of unmatched words, choose the "best" of them.
def isBetterThan(a: (Int, Int), b: (Int, Int)) : Boolean = if (a._1 == b._1) {
if (a._2 < b._2) true else false
} else {
if (a._1 < b._1) true else false

// Receives two expressions by nam, who, in turn, return a tuple representing
// matching efficacy. Evaluates the first expression and, if the efficacy is
// not the best possible, evaluates the second expression and return the better
// one.
def bestOf(aL: => (Int, Int), bL : => (Int, Int)) : (Int, Int) = {
val a = aL
if( a == (0,0)) a
else {
val b = bL
if (isBetterThan(a, b)) a else b

// Try to match the first character of input and menu option. Recurse if
// succesful.
def matchInputToOption(input : List[Char], op : List[List[Char]]) : (Int, Int) = {
(input, op) match {
case (in1 :: Nil, (op11 :: op1s) :: ops) if (in1 == op11) =>
(0, ops.length)
case (in1 :: ins, (op11 :: op1s) :: ops) if (in1 == op11) =>
bestOf(matchInputToOption(ins, ops),
matchInputToOption(ins, op1s :: ops))
case (_, _) =>
(input.length, op.length)

// Listify inputs, zip menu options with computed matches, filter for
// matches which consume the whole input, sort them, return.
def searchOptions(input : String, menu : List[String]) : List[(String, Int)] = {
val inputList = input.toList
val menuList =" ")

zip (menuList map (menuOp => matchInputToOption(inputList, menuOp)))
filter (_._2._1 == 0)
sort ((a,b) => isBetterThan(a._2,b._2))
map (x => (x._1, x._2._2))

This algorithm was born of matchInputToOption. I was charmed by the flexibility of match, and thinking myself oh-so-clever by realizing I could call tail on the list of words or the list of characters of the head of list of words as the only two alternatives after each match. All the rest as created to make this function work.

I do think that function is clever, and it does go to show Scala's power. The algorithm, itself, can go directly to the nearest trash can. It is easy to check it's correctness -- and please note it only works with immutable objects -- but that isn't enough to make up for its many failings.

First, I do not precompute anything, and I do require some work transforming all those strings into lists. If Scala is using projections to do it, it might not be all that bad though.

Next, it has no memory of previous matches. Each time it's called, it matches the whole input. If you happen to feed it character by character, this becomes very wasteful.

Memory-wise... well, that depends on that projection thingy. If split and toList are being strict (precomputing everything), then we have lists of characters, which are very wasteful.

It is rather fast, but there are faster ones.

And, finally, it's not particularly pretty to look at or easy to understand.

In my next post I'll show a truly memory wasteful algorithm, but a very quick one.

No comments:

Post a Comment