Comments on Algorithmically challenged: A Cute Algorithm

Interesting. RE: http://analgorithmaday.blogspot....

2011-05-20T16:04:52.487-03:00

Interesting.

RE: http://analgorithmaday.blogspot.com/2011/05/count-duplicates-in-integer-arraygoogle.html

A couple of points:
1. The algorithm on that page doesn't work. It counts incorrectly:

"if you have, A[] = {1,1,1,2,2,2,3,3,4}
you should print, 1=>3, 2=>3, 3=>2, 4=>1"

The program produces:
1=>3, 2=>1, 3=>1, 4=>1

The author even states:
"I came up with a approach but cannot make that into a working code"

Now, the O(n) is on the number of comparisons, not the number of recursions of the function. Given that, I augmented the code to count the number of comparisons actually executed:

Input: 1,1,1,2,2,2,3,3,4
Comparisons: 9

Input: 1,2,2,3,3,3,4,4,4,5,5,5,5,6,6,6,6,7,7,8,9
Comparisons: 34

Input: 1,1,2,2,2,3,4,4,4,5,5,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,9
Comparisons: 61

So the performance is worse than the naive approach of just walking the list. This is because the algorithm is based on guessing and the costs for bad guessing are very high.

There may be an algorithm that can do this in better than O(n), but I haven't seen it.

Frankly, I think that the answer the interviewer is looking for is a parallel one where this problem can be solved in O(n)/m time where m is the number of parallel partitions.

RE: definition of k-smallest

That's fine. That particular definition is not one I've encountered, on the other hand the one I proposed is one I encounter quite frequently. Different strokes for different folks I guess.

@corruptmemory There are two issues here: whether...

2011-05-15T21:28:02.038-03:00

@corruptmemory

There are two issues here: whether my interpretation of k-smallest number is correct or not, and how to adapt the algorithm to make it work according to your interpretation.

The k-smallest number is formally known as the k-th order statistic. It is usually defined over distinct numbers, but, for a list of n numbers, the minimum is the 1st order statistic, the maximum the n-th order statistic, and, if n = m * 2, the median is the (m+1) order statistic. Given that, I feel my definition is correct -- or, at least, more common.

Regardless, there's the question of how to adapt the algorithm to meet your own definition. First, I can guarantee that you won't get better than O(min(k, log N+M)), since you'll need to "confirm" at least k numbers to be distinct.

However, you _can_ get that performance, since one can count the number of distinct elements in a list in O(distinct numbers). In fact, the very same blog that inspired post has that algorithm as well: http://analgorithmaday.blogspot.com/2011/05/count-duplicates-in-integer-arraygoogle.html.

So, one possible solution is the following. First, adapt that algorithm to just store the numbers it finds in an array. It only need to get the first k numbers -- once it does that, it can finish.

Now apply this algorithm to each of the two sorted arrays. Finally, run my algorithm over the output of the previous step, which is guaranteed to only contain distinct numbers.

Seems to me that the performance should be O(k), or, perhaps, O(k log k).

Correction: m.length tests, sorry.

2011-05-15T20:12:57.024-03:00

Correction: m.length tests, sorry.

Well, the point I was getting at is that the solut...

2011-05-15T20:01:42.834-03:00

Well, the point I was getting at is that the solution provided only works if the two sublists to be merged to not contain duplicate elements before the k-th element (of course you may not know this fact before looking for the k-th element). So a general, and I would argue correct, solution needs to be able to handle duplicate elements correctly. I do not see a solution that is O(k) for the general case, take, for example:

val k = 3
val m = Array(1,1,1,1,1,1,1,1,1,2,3)
val n = Array(1,1,1,1,1,1,1,1,1)

This particular problem will require m.length+n.length tests as best as I can tell.

@corruptmemory, if you allow duplicate elements t...

2011-05-15T03:53:07.181-03:00

@corruptmemory, if you allow duplicate elements the elementary action "get the k^{th} element of the sorted array" requires at least O(min(k,n-k)). Since imagine you checked through all elements 1..k, and imagine they're all distinct. In that case, ar[k] is the k^{th} element if ar[0]≠ar[1] and ar[k+1] is the k^{th} element if ar[0]=ar[1].

You can keep the same algorithm and use described O(k) method of finding the k^th element instead of just applying ar(k), though.

Perhaps this is stated in the original problem, bu...

2011-05-15T02:08:32.271-03:00

Perhaps this is stated in the original problem, but your solution is assuming that the 'k'-th smallest is based on "position" within in the array instead of the 'k'-th smallest number overall.

Limiting the example to 1 array, there are two interpretations of 'k' smallest:

k=3
Array(1,1,1,2,2,4,4,5,5,7,7,8)

Given your solution you would choose 1 as the k-smallest integer whereas I would argue that the k-smallest is 4.