Saturday, April 30, 2011

Expressive Code and the Alternative Vote

One of the joys of writing code in Scala is how expressive it looks. Instead of dealing with the minutia of handling data, I can concentrate on what the code is actually doing. With experienced Scala programmers, that goes without saying. Newcomers have a harder time, because they are still a bit confused by grammar and vocabulary to pay proper attention to what is being said, so to speak.

The most striking example of that comes from the use of Scala's collection. There are many powerful collection libraries out there, but you are usually made very aware that you are handling a collection -- code to be hidden inside a class, to avoid contaminating business logic with it. Scala's collections, on the other hand, can easily fit in the highest abstraction levels of the code.

Let me take an example from the upcoming British referendum about the adoption (or not) of Alternative Vote. In this voting system, each voter ranks the candidates in order of preference. Depending on the specifics of its implementation, it may be possible or not to leave candidates unranked. The winner is the candidate that manages to get 50% of the votes, with the the candidate with the least votes being removed and his or her votes being reassigned according to the voter's preferences until some candidate gets elected.

So, let's consider how one could implement the algorithm that decides who the winner is. Let's say we have a set of candidates, identified by their names, and a list of votes. Each vote is a list of candidates ranked by preference. From that, we have to produce the name of the winner. In other words:

def elect(candidates: Set[String], votes: Seq[Seq[String]]): String = {

So, where do we begin? There are three essential tasks: we need to tally the votes for every candidate, we need to see if the candidate with most votes has at least 50% of all votes, and we need to discover who's the candidate the the least amount of votes.

Let's say we did these tasks, and now we know who the candidate with least and most votes are, and we have a tally of votes for all candidates. In this case, we can return the winner very easily, if there's one:

    if (votesByCandidate(mostVotesCandidate) >= votes.size / 2) mostVotesCandidate

Which still leave us the problem of how to handle the (expected) usual case of no candidate reaching 50% on first tally. There's an easy solution for that, though: just remove the candidate with least votes from the pool of candidates, and try to get a winner out of that. It's easy, because we already have a function that does that:

    else elect(candidates - leastVotesCandidate, votes)

Of the three tasks posed earlier, two are pretty simple as well: deciding who's the candidate with least and most votes. We could sort all candidates by vote and take the first and last candidate, but we don't actually need to sort anything: just knowing top and bottom is enough. We can do that like this:

    val mostVotesCandidate = candidates maxBy votesByCandidate
    val leastVotesCandidate = candidates minBy votesByCandidate

Now all that's left is finding how many votes each candidate has. We could pick the first preference in each vote and do a tally on that, but some candidates may have been removed from the pool. Instead let's say the valid candidates of a vote are the ranked list of candidates for a vote that are still in the running. We can compute that for a vote by doing:

    def validCandidates(vote: Seq[String]): Seq[String] = vote filter candidates

This doesn't read right, actually. A minor complain some people have about collection methods is that filter seems to do exactly the opposite of what's wanted: if you say filter X (those for which X is true), then it will keep X instead of discarding it. So, when we say "filter candidates", it will keep these candidates in the vote, and discard the rest.

The other non-obvious thing about this line is "candidates" itself. What does it mean to say "filter candidates"? Well, "filter" takes a function which, given a value of the collection, will return true or false depending on whether that value must be kept or discarded. That means "candidates" must be a function which, given the name of a candidate, returns true or false.

However, "candidates" is a set! We declared it so in the very first line of code presented, didn't we? Well, in Scala a set is also a function that tests whether a value is present in the set or not, returning true or false accordingly. In fact, sequences and maps are also functions in Scala, the former from indices to values, and the latter from keys to values.

Well, enough about that. We can now just take the first candidate in the list of valid candidates and tally the votes... as long as all candidates are ranked in each vote. If that is not the case, then a vote may well not contain any candidates still in the running, in which case the list of valid candidates will be empty.

Since this example comes from the British referendum, and the AV proposed in that referendum does not require votes to rank all candidates, we'll deal with that. Let's say the first valid candidate of a vote may be some candidate or no one. That is,

    def firstValidCandidate(vote: Seq[String]): Option[String] = validCandidates(vote) headOption

We can then use this to get a list of first choices for all votes with a valid first candidate:

    val firstChoices = votes flatMap firstValidCandidate

The votes for a candidate are the number of first choices for that candidate. We'll make a map out of it to avoid recounting that every time.

    def votesFor(candidate: String) = firstChoices count (candidate ==)
    val votesByCandidate = candidates map (candidate => candidate -> votesFor(candidate)) toMap;

Finally, we have to do something about the possibility of no candidate reaching 50%, which is possible in a system where not all candidates are ranked. I don't know how the proposed system will do in that case, but I'll just choose the most voted candidate if there aren't more than two.

With that fix in, this is what the whole code looks like:

def elect(candidates: Set[String], votes: Seq[Seq[String]]): String = {
    def validCandidates(vote: Seq[String]): Seq[String] = vote filter candidates
    def firstValidCandidate(vote: Seq[String]): Option[String] = validCandidates(vote) headOption
    val firstChoices = votes flatMap firstValidCandidate
    def votesFor(candidate: String) = firstChoices count (candidate ==)
    val votesByCandidate = candidates map (candidate => candidate -> votesFor(candidate)) toMap;

    val mostVotesCandidate = candidates maxBy votesByCandidate
    val leastVotesCandidate = candidates minBy votesByCandidate

    if (votesByCandidate(mostVotesCandidate) >= votes.size / 2 || candidates.size <= 2) mostVotesCandidate
    else elect(candidates - leastVotesCandidate, votes)

While there's a Scala oddity here and there, the code is pretty clear for all that it is doing.


  1. Why is it so obfuscated? Yes, sets and maps implement apply, but to be honest, I think you're abusing that a little in this example - they make the code a puzzle to decode rather than something to scan and grasp.

    "vote filter candidates" took me a while to figure out. The identifier "vote" would be clearer as "orderedChoices", e.g. "orderedChoices.filter(candidates.contains(_)" makes perfect sense to me. Even the original "orderedChoices filter candidates" makes more sense.

    Additionally, you should mention that Set only implements maxBy and minBy in the (unreleased) 2.9. I would move the validation out to an external wrapper - it takes extra time and code complexity to do it at every match-up.

    Sorry to be so critical but we all need to fight the stereotype of Scala as a confusing or "complex" language.

    Here's my entry (I also assume 2.9's maxBy and minBy; in this case on Map):


    type OrderedChoices = Seq[String]

    def elect(candidates : Set[String], orderedChoicesList : Traversable[OrderedChoices]) = {
    val validVotes =
    electHelper(candidates, validVotes)

    def electHelper(candidates: Set[String], orderedChoicesList: Traversable[OrderedChoices]): String = {
    val firstChoices = orderedChoicesList.flatMap(_.headOption)
    val votesByCandidate = firstChoices.groupBy(identity).mapValues(_.size) // map from candidate to vote count

    val (winner,winnerVotes) = votesByCandidate.maxBy(_._2)
    val (loser,_) = votesByCandidate.minBy(_._2)

    if( winnerVotes >= orderedChoicesList.size / 2 || candidates.size <= 2 )
    elect(candidates - loser, orderedChoicesList)

  2. Formatting got nuked. Here it is (with a little cleanup, like only calculating the loser when we don't find a winner):

  3. Also, I'm mistaken. I see now that you're not validating the candidates - you're removing ones who've been eliminated in previous rounds. So the elect/electHelper distinction can go away and the logic in my elect needs to be at the top of the current electHelper.

  4. You picked on the single line that I said didn't read right, though I blame the word "filter" on that. And while "contains" makes what happens more explicit, I much prefer to explain how maps, sets and seqs work than be redundant.

    As for the vote/orderedChoices, vote is clearly defined as "Each vote is a list of candidates ranked by preference." In fact, I think "orderedChoices" obscures the fact that it corresponds to someone's actual vote. I'd probably make vote a case class Vote(choices: Seq[String]) instead, but I don't think it adds readability here, and does add complexity.

    But, aside from these issues which are arguable, I wouldn't go with your code. I feel that mutual recursion is more confusing than simple recursion, and you loose tail recursion optimization to boot.

    My original code used groupBy/mapValue (though then did toMap to it), but I wasn't happy with that. First, "groupBy(identity)" sounds confusing. Second, it produces the list of candidates from the list of votes, which is the main source of my discontent.

    While not as efficient, declaring that the votes of a candidate is the number of votes for that candidate feels much more natural to me. As for efficiency, the number of candidates is bound to be low with AV system (or, otherwise, ranking them would be a nightmare for the voter), and I believe in not prematurely optimizing, and the magnitude of the numbers involved all point to this being a pretty fast calculation.

    Likewise, I avoided tuples, because I hate "_._2". I think it looks ugly and it is completely obscure to its meaning.

    And while I did like how you got the winner votes, that requires one to keep track of the fact that votesByCandidate is a list of tuples. I don't think the tradeoff is worth it.

    Finally, note that there isn't a single instance of "_" in my code. I don't know about you, but those underscores where my main source of unease while learning Scala.

    Oh, and I much prefer infix notation. I believe that punctuation should be used to draw attention to something, instead of being lost in the trees.

  5. First of all, I'm not using mutual recursion - you might want to read my code again. And my code is definitely tail-call optimizable...

    Second, you're talking about trivial syntactic differences - underscore versus named variable, _2 versus an explicit pattern match. These aren't my concern and I wouldn't have commented if they were. If underscores really damage your calm, be explicit:

    allVoters flatMap (choices => choices.headOption)
    votesByCandidate.maxBy{case (candidate,allVotes) => allVotes.size}

    But these are minor nitpicky differences. I'm talking about the overall flow of the program and I stand by my claim that mine is easier to follow than yours is. Using collections as the function argument to filter or maxBy is just poor form, IMO, and needlessly obscure. I don't have a preference between infix and dot-syntax - I used dots because that's my habit, not to make a point. But I would recommend getting accustomed to underscores, since they're idiomatic Scala and they're there for a reason - sometimes repeating the name of the object you're working on is noisier than just the fill-in-the-blank the underscore suggests. You got lucky that you could use container apply methods here but in the general case you're going to need the underscore.

    orderedChoices may not be the perfect name but in the expression "vote filter candidates" vote is a singular noun, which should be a giveaway that your naming might not be mapping well to the real world. Another minor point is that the list of voters doesn't have any sequence - it should be a Traversable, not a Seq; what if one day you want to pass in a Set of votes?

    groupBy(identity) is the only hairy part of my code, but it's not too bad and the benefit is that nice "val (winner,winnerCount)" destructuring later on. If you're worried about tracking the type of votesByCandidate, give it an explicit type (that's a good idea and I replaced my comment with a type assertion).

    I'm not going to sit here and compare underscore counts but here's my updated version, fixing my mistake about valid candidates and removing every single underscore:

  6. In your original code you had elect call electHelper and vice versa. That's mutual recursion and it is not optimized by Scala.

    You pick on "vote" -- though vote is explicitly defined as a list of candidates ranked by preference -- but go on to say that "allVoters" is a list of "choices". How can allVoters be a list of choices, and not voters?

    I call "votes" a list of "vote", and it is a sequence -- not a traversable or iterable -- because it *cannot* be a set. It must be able to admit multiple identical votes, and the only collection that *ensures* that is sequence. To ensure that is typechecked I made it a sequence.

    You say underscore is idiomatic Scala, which is true enough (though you could have replaced (candidates.contains(_)) with (candidate contains). However, you complain about using collections as functions. Well, I have news for you: *that* is idiomatic Scala, and you should get used to it. Collections are functions for a reason, not by coincidence.

    However, I did not write this code for people who are used to reading Scala code -- it adds nothing for them. I wrote it for people who are curious about Scala, or learning it.

    And you think that

    val (winner, winnerVotes) = votesByCandidate.maxBy{case (_,voteCount) => voteCount}

    is preferable to

    val mostVotesCandidate = candidates maxBy votesByCandidate


    You can keep it, thanks. I wish I could use a case class vote(choices: Choices), type Choices = Seq[String], but there's no place to put that -- it adds things that are not related to the problem.

    I did find your use of "find" perfect, though. I'll change my own version to use it as well.

  7. Oh, shoot, now I see what you're saying. The mutual recursion was definitely unintended. The tail call in elect was supposed to be another call to electHelper - that's a typo that arose because I moved the validCandidates check out of the elect function at the last minute and then posted it without trying it. Sorry. Moving it out was wrong on my part anyway.

    I don't have a problem with using collections as functions:

    if( friends(jimmy) ) { .... } // awesome
    a(10) = 444 // great
    if( a(0) > 10 ) { ... } // rad
    vote filter candidates // pushing it, but fine


    candidates maxBy votesByCandidate // contortionistic

    Guess we just disagree.

    I said the voters are a Traversable. Each voter is represented by a Seq of votes. Agree? We both know that the votes themselves have an order, but the outer collection ('votes' in your code) has no order.

    But yes, I think "val (winner,winnerVotes) = votesByCandidate.maxBy(_._2)" is preferable to your code. Not least because I get the winnerVotes variable along with it, which you have to retrieve later and because it's completely straightforward about what it's doing.

    I didn't want to get into a big arguing match, I guess we just disagree about how best to present Scala to newcomers.