Thursday, May 19, 2011

Regex Again

I have been thinking about regex lately. I have never felt comfortable with how Scala regex works, but I could never settle on what should be done about. Recently, I have started more and more of thinking of regex like this:

class RegexF(pattern: String) extends String => Option[Seq[String]]

or, perhaps,

class RegexPF(pattern: String) extends PartialFunction[String, Seq[String]]

In fact, RegexPF.lift would (could) yield a RegexF. It then caught my attention that RegexF.apply has the same signature as Regex.unapplySeq, which is the standard way of handling regex in Scala!

Might this be what has been bugging me about Scala's regex all along? Should we translate

val YYYYMMDD = """(\d{4})-(\d{2})-(\d{2})""".r
val MMDDYYYY = """(\d{2})/(\d{2})/(\d{4})""".r

def getYear(s: String) = s match {
    case YYYYMMDD(year, _, _) => year
    case MMDDYYYY(_, _, year) => year
}

into

val YYYYMMDD = """(\d{4})-(\d{2})-(\d{2})""".r
val MMDDYYYY = """(\d{2})/(\d{2})/(\d{4})""".r andThen (fields => fields.last +: fields.init)

def getYear(s: String) = ((YYYYMMDD orElse MMDDYYYY) andThen (_.head))(s)

I can certainly see the advantages of pattern matching, but... it doesn't compose very well. And it has some performance issues, which is a big deal for most regex usages. And being a PartialFunction would not prevent a Regex from having extractors as well.

2 comments:

  1. so this does not modify my litle life in nothing... but is a great think to to think about.....

    ReplyDelete
  2. Daniel, I thought the same thing. But with a little bit help you can compose Regexs and other extractors this way:

    val Year = pattern {
    case YYYYMMDD(year, _, _) => year
    case MMDDYYYY(_, _, year) => year
    }

    text match { case Year(y) => y ... }

    I have an old blog about this and similar ideas:

    http://notes.langdale.com.au/Querying_a_Dataset_with_Scala_s_Pattern_Matching.html

    Cheers,
    Arnold

    ReplyDelete