Friday, June 5, 2009

Scalable Algorithms 3

Back to matching shortcuts to menu options, I decided my best option would be a deterministic finite automaton. It would make excellent use of partial input, and take much less space than a set of all possible inputs accepted -- to make use of state machine lingo -- by each menu option.

Now, anyone who has ever worked with DFA before will tell you that the trick to build one is to first generate the regular expression which represents what you want to match. Yes, that's your good, old RegExp. Well, not fully... you can't have back references, for one thing. But most of what you think of as regexp will do.

So, what regular expression would match our input? Let's take a small menu option: "New Tab". We want to accept either word as well as its abbreviation, so "New", "Tab" and "NT" should be matched. Any prefix of either word should be taken as well, so we add "N", "Ne", "T" and "Ta" to it. Finally, we want to combine word prefixes, like "NTab" or "NeTa", you get the idea.

So, let's get the first word, and see how far we can take. We have "N", "Ne" and "New". So "N" is mandatory, and "e" is optional. That would be "Ne?" in regexp. Next, "w" is also optional, but it can only appear if "e" appears as well. That's "N(ew?)?" in regexp, or, to make regular use of parenthesis, "N(e(w)?)?". Now, since we may as well only match the second word, the whole first word is optional, which gives us "(N(e(w)?)?)?".

Naturally, each word would follow the same pattern, but what about the whole menu option? As it happens, you only have to concatenate the patterns: "(N(e(w)?)?)?(T(a(b)?)?)?". It should be pretty simply to create such a function.

In fact, we can do it using foldRight. As you know, foldRight combines things from right to left, using the result of the last operation and the next element. Now, it may not be immediately obvious what the operation is, so let's assume we already processed "w". That means we have "(w)?" as the last result, and "e" as the next letter, and the result we want to come up with is "(e(w)?)?". It's trivial to see, then, that the operation is "("+"e"+"(w)?"+")?".

Now that we know what the operation is, let's think about the initial "result" that fold takes as parameter. We want x in "("+"w"+x+")?". Since that must be equal to "(w)?", x = "".

Ok, so now we know what the regular expression for a menu option must look like, as well as how to build one from its string. We can now proceed to building a DFA.

But... we can match regular expressions to strings just as well as DFA. I might not take advantage of partial inputs, and I can't go the next step, which is creating one DFA which will accept all menu options and, as an added bonus, have the associated menu options at each state.

Still, how hard would it be to match against regexp? Well, let's try...



val wordsep = "\\W+"
def words(s : String) = s.split(wordsep)
def phraseRegex(s : String) =
"^" + words(s).map(_.foldRight("")("(" + _ + _ + ")?")).mkString
def regexList(l : List[String]) = l.map(s => phraseRegex(s) -> s)
def regexSearcher(l : List[String]) = {
val r = regexList(l)
(s : String) => r.filter(s matches _._1).map(_._2)
}


That's it. You could, and should improve it by pre-compiling the regexp expressions. It's a bit uglier, but for the sake of doing things right, I'll post it at the end. Anyway, this is fairly inexpensive in terms of memory usage, and regexp matches are very fast. I doubt it can beat a hash tree, but it should give it a run for its money. Still, the fact that you'll be matching every menu option against the input is problematic, and the hash tree doesn't suffer that problem. The setup time for this algorithm should be pretty fast too, even precompiling the regexp.

Anyway, it's short, it's obvious, and it's not too shabby. We'll next get into DFA, and do it by using Scala's standard library. That should be the worst of all algorithms, but a stepping stone to the last one I'll present. For now, here's the revised code.



val wordsep = "\\W+"
def words(s : String) = s.split(wordsep)
def phraseRegex(s : String) =
("^" + words(s).map(_.foldRight("")("(" + _ + _ + ")?")).mkString).r
def regexList(l : List[String]) = l.map(s => phraseRegex(s) -> s)
def regexSearcher(l : List[String]) = {
val r = regexList(l)
(s : String) => r.filter(_._1.findPrefixOf(s) == Some(s)).map(_._2)
}

No comments:

Post a Comment